More diverse phrases improve intent recognition accuracy.

AIIntermediate

Training Phrase

Q: What is the most important tip for working with Training Phrase?

Spend an hour pulling real conversation phrases before writing a single invented one. Real phrasing contains the slang, typos, and incomplete sentences your bot will actually see, and no amount of author effort beats real data.

Q: What is a Training Phrase?

Training Phrases teach bots to recognize user intent.

Q: How should you improve them?

Real user language should inform training phrase updates.

A training phrase is one literal example of a user message that should be classified as a specific intent in Salesforce Einstein Bots or the legacy Einstein Intent Service.

Hear it

§ 01

Definition

A training phrase is one literal example of a user message that should be classified as a specific intent in Salesforce Einstein Bots or the legacy Einstein Intent Service. Authors write 20 to 100 training phrases per intent, the underlying NLP model learns the cluster of phrases that map to each intent, and at runtime the model classifies new messages by finding the closest cluster. Training phrases are the raw material the model learns from; the variety and realism of the phrases determines how well the bot generalizes to messages it has never seen.

Training phrases are the most consequential authoring decision in any traditional bot. The bot can only classify messages similar to its training. A model trained on twenty near-identical phrases matches only near-identical messages in production and frustrates users who phrase things naturally. A model trained on thirty to fifty varied phrases across length, vocabulary, and grammar handles real user variation. Pulling phrases from real conversation logs (email-to-case subject lines, web form submissions, prior chat transcripts) beats inventing phrases every time, because real phrasing contains the slang, typos, and incomplete sentences the bot will actually see.

§ 02

Why training phrases are the single most consequential bot authoring decision

How many training phrases an intent needs

The floor is around 20 phrases for an intent to function. Below that the model overfits to the small set and matches almost nothing in production. The sweet spot is 30 to 50 varied phrases. Above 100, returns diminish quickly. Each additional phrase needs to add genuine variety; adding ten more phrases that paraphrase existing ones does not improve the model meaningfully. The Salesforce Bot Training View shows phrase count per intent and warns when intents are under the floor. Treat the warning as a hard rule, not a suggestion; under-trained intents are the most reliable source of bot misclassification in production.

What variety actually means in training phrases

Variety has three dimensions. Length: short phrases (3-5 words), medium (6-15 words), long (15+ words). User messages span all three; training only on medium produces a model that misses short and long messages. Vocabulary: formal language, casual language, slang, abbreviations, common typos. A user who writes "wheres my pkg" should match the same intent as one who writes "Could you please provide an update on the status of my recent order." Grammar: questions, statements, fragments, commands. "Order status?" is a fragment that should match. "Tell me where my order is" is a command. "I want to know about my order" is a statement. All three should classify to the same intent.

Pulling training phrases from real conversation logs

The fastest path to high-quality training phrases is mining existing conversation data. Email-to-case subject lines are the most underused source; they are short, real, and already labeled by category. Web form submissions where users picked a topic from a dropdown give labeled examples for free. Existing chat transcripts (if any) give the exact phrasing the new bot will see. Internal Slack channels for support questions occasionally help. The discipline that pays off: spend an hour pulling 100 real phrases per intent before writing a single invented phrase. The model trained on real phrases consistently outperforms the model trained on invented ones, regardless of author effort.

Negative training phrases and how they tighten boundaries

Einstein Bots supports negative training phrases on each intent. A negative phrase tells the model: messages like this should not match this intent. Order Status with positive phrase "where is my order" and negative phrase "I want to return this order" produces a tighter boundary against the Returns intent. Negative phrases are most valuable when two intents share vocabulary; without them, the bot flips between the two on similar messages. Aim for three to five negative phrases per intent that overlaps with another. Skip negative training for intents with no overlap; it is busywork that adds no value.

Common authoring mistakes

Four mistakes recur. First, training phrases that are too similar to each other; the model learns the narrow pattern and rejects everything else. Second, training phrases that include named entities (customer name, order number) verbatim; the model learns to match the specific value rather than the pattern. Third, training phrases written by the author rather than pulled from real data; the author writes how the author talks, not how users do. Fourth, training only on the happy path; the bot misclassifies users who phrase their question slightly differently or include irrelevant detail. Each mistake produces a bot that worked great in testing and failed in production.

Tuning workflow with the confusion matrix

After the bot has run in production for two to four weeks, the Intent Insights confusion matrix shows where the model is making mistakes. Off-diagonal cells are intent confusions: Order Status messages classified as Returns, or vice versa. The fix is targeted: add positive training phrases for the misclassified pattern, or add negative training phrases that exclude the confused intent. Re-train and observe the next two weeks. Most production bots need three to four rounds of confusion-matrix-driven tuning to reach an acceptable confusion rate. The work is iterative, not one-time. Build the tuning cadence into the bot's lifecycle from the start.

Training phrases vs Agentforce topic descriptions

Agentforce topics replace training phrases with natural-language classification descriptions. Instead of writing 30 phrases, the author writes one description: "Use this topic when the user asks about order status, delivery date, tracking number, or where their package is." The LLM-driven classifier evaluates the description against the message and picks the topic. The trade-off is real. Training phrases give the author tight control over what matches; the description gives the author less control but covers more ground per unit effort. Most new bot work in 2026 is happening on topics rather than intents, but the discipline of writing precise positive and negative examples carries over directly; a good topic description is just compressed training phrases with negative cases inline.

§ 03

How to author training phrases that hold up in production

The successful authoring pattern is real-data-first, varied across length, vocabulary, and grammar, with negative phrases for overlapping intents, and a tuning cadence driven by the confusion matrix from production. Skipping any of those steps produces a bot that demos well and fails for real users.

Spend an hour pulling real phrases per intent
Mine email-to-case subject lines, web form submissions, and any prior chat logs. Aim for 100 real candidates per intent before writing a single invented phrase.
Pick 30 to 50 varied phrases per intent from the candidate pool
Vary length (short, medium, long), vocabulary (formal, casual, slang), and grammar (questions, statements, fragments, commands). Skip near-duplicates; each phrase should add genuine variety.
Strip out named entities from phrases
Replace specific values (customer names, order numbers, dates) with generic placeholders or remove them. The model should learn the pattern, not the specific value.
Add 3 to 5 negative training phrases per overlapping intent
For each pair of intents that share vocabulary, add explicit negative phrases that belong to the other intent. Skip negative training for intents with no overlap.
Train the model and test in the bot preview
Send the candidate messages back through the bot and confirm they classify as expected. Send variations the model has not seen. Watch for intents that match too broadly or too narrowly.
Pilot in production for two to four weeks
Real production conversations surface misclassifications that the preview misses. Pilot with a small user group before broad rollout.
Tune from the Intent Insights confusion matrix
Open the confusion matrix after two to four weeks. Off-diagonal cells are the targets. Add positive or negative training phrases to crispen the boundaries. Re-train and observe.

Key options

Phrase count per intentremember

30 to 50 is the sweet spot. Below 20 the model under-fits; above 100 returns diminish quickly.

Source of phrasesremember

Real conversation data (email-to-case, web forms, chat logs) outperforms invented phrases every time.

Variety dimensionsremember

Length (short, medium, long), vocabulary (formal, casual, slang), grammar (question, statement, fragment, command). Each axis matters.

Negative training phrasesremember

Three to five per overlapping intent pair. Skip for intents with no overlap.

Tuning cadenceremember

Two to four weeks of pilot data, then confusion-matrix-driven tuning. Iterate three or four rounds for production-ready quality.

Gotchas

Twenty near-identical phrases produce a brittle model. Variety matters more than quantity; add 30 varied phrases rather than 60 paraphrases.
Training phrases that include verbatim entity values teach the model to match the value, not the pattern. Strip or genericize specific values.
Phrases invented by the author capture how the author writes, not how users do. Real-data-first is the discipline that distinguishes bots that work from bots that demo.
Skipping negative training for overlapping intents produces a bot that flips its classification on near-identical messages. The fix is targeted negative phrases, not more positive ones.
Tuning is iterative, not one-time. Most production bots need three to four rounds of confusion-matrix-driven tuning to reach acceptable quality.

Trust & references

Sources

Cross-checked against the following references.

Einstein Bots training phrase referenceSalesforce
Agentforce topic descriptions as evolutionSalesforce

Official documentation

Straight from the source - Salesforce's reference material on Training Phrase.

Intents and Training PhrasesSalesforce Help
Negative Training PhrasesSalesforce Help
Intent Insights ReportSalesforce Help

Was this entry helpful?

Help us write better definitions. Quick reactions or detailed edit suggestions.

About the Author

Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.

Test your knowledge

Q1. What is a Training Phrase?

Q2. How many per intent?

Q3. How should you improve them?

Discussion

Loading…

Loading discussion…

Back to Dictionary