The intents that work in production are designed against real user phrasing, not against the author's imagination. Pull conversation logs (even from a competing channel like email-to-case) and write training phrases that match how customers actually word their questions, then tune against the confusion matrix from real bot runs.

List the intents the bot must recognize
Start with five to ten intents covering the most common user goals. Add an Out of Scope intent that the bot uses to escalate when no other intent fits. Resist starting with twenty intents; you cannot tune that many in parallel.
Pull real user phrasing from existing channels
Look at email-to-case subject lines, web form submissions, and existing chat logs if any. Extract one to two hundred real messages per intent. Real phrasing trumps invented phrasing every time.
Author 30 to 50 varied training phrases per intent
Vary length (short, medium, long), vocabulary (formal, casual, slang), and grammar (questions, statements, fragments). Twenty near-identical phrases produce a brittle model. Forty varied phrases produce a model that generalizes.
Add negative training for intents that overlap
For each pair of intents that share vocabulary (Order Status and Returns both mention "order"), add three to five negative training phrases that explicitly belong to the other intent. The boundary gets crisp.
Set the confidence threshold based on initial logs
Start at 0.7. Look at the first two weeks of production conversations. If too many low-confidence messages route to the picked intent and fail, raise the threshold. If too many valid messages escalate, lower it.
Use the Intent Insights confusion matrix to drive tuning
After two to four weeks of production, open the Intent Insights report. The off-diagonal cells show intent confusions. Add training phrases or negative training to crisp up the boundaries.
Re-train monthly for the first three months, then quarterly
Intent models drift as user phrasing evolves. Monthly retraining for the first quarter catches the early variation. Quarterly thereafter is sufficient for most bots.

Key options

Training phrasesremember

Literal examples of user messages the intent should match. Aim for 30 to 50 varied phrases per intent.

Negative training phrasesremember

Messages that should not match this intent. Critical for intents that share vocabulary with another intent.

Confidence thresholdremember

Minimum confidence required to trust the picked intent. Tuned to balance over-matching and over-escalation.

Out of Scope intentremember

Catch-all intent for messages that fit no defined intent. Drives escalation behavior.

Re-training cadenceremember

How often the intent model retrains against new conversation data. Monthly for new bots, quarterly for stable ones.

Gotchas

Twenty near-identical training phrases produce a brittle model that only matches near-identical messages. Vary length, vocabulary, and grammar across the training set.
Confusing intents (which classify) with entities (which extract) is a common authoring error. Order numbers belong in entity definitions; phrasing patterns belong in training phrases.
Without negative training, intents that share vocabulary flip between turns. Add explicit negative phrases for every overlapping pair.
The confidence threshold cannot fix bad training data. Tuning the threshold helps at the margins; tuning the phrases fixes the underlying issue.
Intent models drift as user phrasing evolves. A model that worked in Q1 may need retraining in Q3. Schedule the retraining cadence rather than waiting for complaints.

How to design and tune intents that hold up in production

See the full Intent entry