Salesforce Dictionary - Free Salesforce GlossarySalesforce Dictionary
All articles
Agentforce·June 12, 2026·12 min read·1 view

Agentforce Voice: The Complete 2026 Guide to AI Phone Agents

How Salesforce's voice agents answer real phone calls, what went GA in Summer '26 including SIP routing and the Mobile SDK, and what to check before you put one in front of customers.

Agentforce Voice, Salesforce's AI phone agent that answers calls, takes action in CRM, and hands off to humans, now GA with SIP routing and mobile embedding in Summer '26
By Dipojjal Chakrabarti · Founder & Editor, Salesforce DictionaryLast updated Jun 12, 2026

A customer calls your support line at 7:40 pm to reschedule a technician visit. The IVR offers her nine options, none of which is "reschedule a technician visit." She presses zero, waits eleven minutes, and hangs up. The next morning she opens a case by email instead, your CSAT survey takes a hit, and the technician still shows up to an empty house.

That phone tree is the thing Agentforce Voice was built to replace. It puts an AI agent directly on the phone line: it picks up, listens to what the caller actually says, looks up the appointment in Salesforce, moves it, confirms the new time out loud, and logs the whole exchange against the contact record. No menu. No hold music. And when the situation needs a human, it transfers the call with the transcript and the caller's mood attached, so nobody has to repeat themselves.

Salesforce announced Agentforce Voice in October 2025, timed for Dreamforce, and spent the following two releases hardening it. The Summer '26 release is the point where it stopped being a keynote demo and became something you can defend in an architecture review: voice support in the Agentforce Mobile SDK went GA for iOS, Android, and React Native, and calls can now route over SIP instead of requiring a PSTN number. This guide covers how the product actually works, what changed in Summer '26, what it costs, and the failure modes you should test before a single customer hears it.

What Agentforce Voice Actually Is

Agentforce Voice is the phone channel for Agentforce. The same agent metadata you build in Agentforce Builder, the topics, the actions, the instructions, gets a voice in front of it. When a call comes in, the platform transcribes the caller's speech in real time, runs the words through the Atlas reasoning engine to pick a topic and plan actions, executes those actions against your org, and speaks the result back with synthesized speech.

The pitch Salesforce leads with is emotional range. The launch messaging promised AI that understands emotion, nuance, and intent, which in practice means the agent detects sentiment from tone and word choice, adjusts its phrasing, and treats rising frustration as an escalation signal rather than a transcription artifact. One Salesforce executive framed the design goal bluntly: "You can have the creativity and fluidity when you want it, or the rigidity, consistency, and scale when you don't."

The competitive subtext matters too. Agentforce Voice competes head-on with Sierra, the customer service AI startup founded by former Salesforce co-CEO Bret Taylor. Salesforce's counterargument is location: the voice agent lives where your cases, orders, and entitlements already live, so there is no integration project standing between the caller's question and the answer.

How a Call Flows Through the System

Understanding the pipeline makes every design decision downstream easier. Here is what happens between "phone rings" and "caller hangs up happy."

Agentforce Voice call pipeline from caller through telephony, transcription, Atlas reasoning, and actions back to speech

  1. The call arrives over telephony. Either a PSTN phone number or, new in Summer '26, a SIP trunk delivers the audio. Salesforce's own release notes position SIP as the cheaper, more flexible option since it rides your existing internet connection instead of a dedicated carrier number.
  2. Speech becomes text. Real-time transcription converts the caller's words into a running transcript. This transcript is also what gets stored, searched, and analyzed later.
  3. Atlas reasons over the words. The reasoning engine classifies the caller's intent against your agent's topics, retrieves context from CRM records and knowledge articles, and plans which actions to run. The same topic-and-action model you already know from chat agents applies unchanged.
  4. Actions execute. Flows, Apex invocable methods, prompt templates, and API callouts do the actual work: reschedule the appointment, check the order status, update the contact's phone number.
  5. Text becomes speech. The response is synthesized into natural-sounding audio and spoken back. Latency matters enormously here; a two-second pause that nobody notices in a chat window feels like a dropped call on the phone.
  6. Escalation, if needed. When the agent hits a guardrail, a low-confidence intent, or an angry caller, it transfers to a human through Omni-Channel with the transcript, the context, and the sentiment readout attached.

Every step writes back to Salesforce. The call becomes a record: transcribed, logged, linked to the contact, and available to reporting. That is the quiet strategic play. Salesforce wants voice to stop being a cost center that vanishes into call recordings nobody listens to, and start being a searchable data source that improves the next thousand calls.

What Went GA in Summer '26

The Summer '26 release, live in production orgs from June 15, moved three voice capabilities from preview to GA. If you evaluated Agentforce Voice in 2025 and passed, these are the items that change the math.

Voice in the Agentforce Mobile SDK. You can now embed a voice-to-voice agent inside your own iOS and Android apps using the Agentforce Mobile SDK, which is itself GA in Summer '26. The SDK supports native iOS, native Android, and React Native with feature parity across all three. On iOS, speech-to-text and text-to-speech engines are injectable through Swift optionals, so you can swap in your own STT or TTS provider if the defaults do not fit. The practical effect: the "call us" button in your app can become a conversation with an agent that already knows who the user is, because they are authenticated.

SIP routing. Before Summer '26, inbound calls could only reach an Agentforce Service Agent through a PSTN phone number. Now the Agentforce Voice setup page has a SIP tab that walks you through routing calls over Session Initiation Protocol. For organizations already running SIP trunks into their contact center, this removes a carrier dependency and a line item. The feature applies to Lightning Experience in Enterprise, Performance, Unlimited, and Developer Editions with Foundations or Agentforce 1 Editions.

Multi-agent orchestration. Also GA on June 15, orchestration lets a voice agent delegate to specialist agents mid-call. The caller asks about a refund and a delivery date in one breath; the orchestrator fans the work out and assembles one spoken answer. We covered the orchestration model in depth in our multi-agent orchestration guide, and all of it now applies to the phone channel.

Phone Trees Had a Good Run. It's Over.

The fairest way to evaluate Agentforce Voice is against the thing it replaces, because almost nobody is replacing human agents with it on day one. They are replacing IVR.

Comparison of a rigid IVR phone tree against an Agentforce Voice agent that resolves intent in one exchange

An IVR is a decision tree the caller has to walk blind. It can only handle intents someone predicted at design time, it punishes anyone whose problem spans two branches, and its only data output is which buttons got pressed. A voice agent inverts each of those properties. The caller states the problem in their own words. Intent classification handles phrasing the designer never imagined. Compound requests get decomposed instead of forcing two calls. And the output is a full transcript with sentiment, linked to CRM records.

The honest caveat: an IVR never hallucinates. A phone tree will never confidently tell a caller their warranty covers something it does not. A voice agent can, which is why grounding, guardrails, and testing carry more weight on this channel than on chat, where the user can at least re-read the answer. Treat that asymmetry as the central design constraint, not a footnote.

Deployment Options and Telephony

Agentforce Voice does not replace your telephony stack; it plugs into it. As of mid-2026 there are three deployment paths.

Three deployment channels for Agentforce Voice: contact center telephony partners, SIP trunks, and embedded mobile apps via the SDK

Through a contact center partner. Agentforce Voice integrates with the major CCaaS providers: Amazon Connect, Five9, NICE, Genesys, and Vonage. If you already run Service Cloud Voice with one of these, the voice agent slots in as the first responder on the line, with your existing routing as the escalation target. Prerequisites are what you would expect: Enterprise Edition or above, a Service Cloud license, Voice enabled, a telephony partner connected, and Agentforce turned on.

Direct over PSTN or SIP. For simpler setups, you point a phone number (or, since Summer '26, a SIP trunk) at an Agentforce Service Agent and configure the connection from the Agentforce Voice setup page. The wiring between telephony and agent runs through an Omni-Channel flow, which is also where you define what happens on escalation.

Embedded in your mobile app. The Mobile SDK path skips the phone network entirely. The voice conversation happens in-app over the data connection, authenticated, with the agent able to read the user's records without asking who they are. For B2C orgs this is the most interesting option of the three, because it merges the convenience of a call with the identity context of a logged-in session.

One constraint to plan around: regional availability of the voice channel was limited to the United States and Canada as of January 2026. Check current availability for your region before you commit a roadmap to it, because language and region support has been expanding release by release but is not universal.

Building One: The Parts You Already Know, Plus Two New Habits

If you have built an Agentforce Service Agent for chat, you know most of the construction already. Topics scope what the agent handles. Actions, built from Flows, Apex, and prompt templates, do the work. Instructions shape tone and boundaries. Grounding in knowledge articles and Data Cloud keeps answers tied to facts. Our custom actions guide covers that layer.

Voice adds two habits on top.

Write for the ear, not the eye. A chat agent can answer with a table and three links. A voice agent that reads out a table is torture. Instructions for voice topics should push short sentences, one piece of information at a time, and explicit confirmation of anything consequential: "I've moved your appointment to Thursday the 18th between 9 and 11 am. Did I get that right?" Numbers, dates, and names deserve read-back confirmation because transcription errors concentrate exactly there.

Test with audio, not text. Typing test utterances into a chat preview tells you the reasoning works. It tells you nothing about barge-in behavior, background noise, accents, or what happens when a caller talks over the agent. Run real calls against the agent before launch, and wire scenario suites into Agentforce Testing Center so regressions surface when you edit topics later, not when a customer finds them.

Escalation Is the Feature That Sells It

Ask any contact center lead what they hate about IVR deflection, and the answer is the cold handoff: the caller fights the machine, finally reaches a human, and starts over from zero. Agentforce Voice's handoff is the part of the demo that lands hardest because it fixes exactly that.

Escalation handoff from Agentforce Voice to a human agent through Omni-Channel with transcript, context, and sentiment attached

When the agent escalates, the human picks up with the live transcript, the records the agent already pulled, the actions it already attempted, and a sentiment readout on the caller. Routing goes through Omni-Channel, so your existing queues, skills, and capacity rules apply. Design your escalation triggers deliberately: low intent confidence, repeated re-phrasings, negative sentiment crossing a threshold, and any topic you have classified as high-stakes (payments, cancellations, anything regulated). The cheapest trust you will ever buy is an agent that says "let me get you to a person" one exchange before the caller would have demanded it.

Pricing and Licensing

Salesforce prices Agentforce Voice on conversation length and tasks completed rather than per interaction, a deliberate contrast with Sierra's per-resolution model. Usage draws down through the Flex Credit system that the rest of Agentforce consumption uses. Budget modeling tip from orgs that went first: average handle time on voice is the variable that moves your bill, so an agent that rambles costs real money. Tight instructions are now a finance concern, which is a sentence nobody expected to write in 2024.

License-wise, the Summer '26 voice features require Enterprise, Performance, Unlimited, or Developer Editions with Foundations or Agentforce 1 Editions. The contact center path additionally assumes Service Cloud and a telephony partner. None of it is free, and a pilot's economics look very different from a full deflection program, so instrument cost per resolved call from the first week.

Where It Still Falls Short

A fair scorecard as of June 2026: latency is good but not invisible, and long action chains can produce pauses you will want to paper over with verbal acknowledgments. Regional and language coverage trails chat. Noisy environments and heavy accents still degrade transcription enough to matter, which is why read-back confirmation belongs in every consequential topic. And the hallucination risk discussed earlier never fully goes away; it gets managed with grounding, narrow topics, and ruthless escalation rules. If a vendor tells you their voice agent has none of these problems, they are describing a roadmap, not a product.

What to Do Next

Pick the one call driver in your contact center with the highest volume and the lowest stakes. Appointment rescheduling, order status, store hours, password resets. Build a single-topic voice agent for it in a sandbox this week, call it yourself from a noisy room, and try to break it: talk over it, mumble a date, ask for two things at once. Then put a five-call-a-day pilot behind your existing IVR's "anything else" branch and measure containment rate and escalation quality for thirty days. You will learn more from those thirty days than from any vendor deck, including this post.

About the Author

Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.

Share this article

Share on XLinkedIn

Sources

Related dictionary terms

Comments

    No comments yet. Start the conversation.

    Sign in to join the discussion. Your account works across every page.

    Keep reading