Salesforce Dictionary - Free Salesforce GlossarySalesforce Dictionary
Full NLP Model entry
How-to guide

How to pick and configure an NLP model for a Salesforce feature

Most Salesforce features pick the NLP model for you. The work is providing good data and tuning the few model-level knobs the feature exposes.

By Dipojjal Chakrabarti · Founder & Editor, Salesforce DictionaryLast updated May 16, 2026

Most Salesforce features pick the NLP model for you. The work is providing good data and tuning the few model-level knobs the feature exposes.

  1. Identify the feature and its model type

    Einstein Bots uses Einstein NLU for intent classification. Einstein Case Classification uses a per-org fine-tuned classifier. Agentforce uses a foundation LLM. The configuration surface differs by feature; check the docs for the specific feature.

  2. Provide labeled training data

    Utterances per intent for bots, historical records per target class for classification, prompt templates and grounding for generative features. The data is the actual customization. The model is fixed.

  3. Train, validate against a holdout

    Launch the build or training pipeline. The feature returns metrics. Spot-check predictions against a hand-labeled holdout set to confirm the metrics match real behavior.

  4. Tune the confidence threshold

    Use the feature's threshold setting to trade off action rate versus accuracy. Start conservative (high threshold) and lower as confidence in the model grows.

  5. Set a refresh schedule

    For features that retrain automatically, verify the schedule is on. For static features, calendar a manual retrain at least quarterly.

Key options
Model selectionremember

Some features expose a model picker (Einstein NLU versus external). Most do not. When the picker exists, pick the model that matches the data language and domain.

Confidence thresholdremember

The probability floor for the feature to act. Tune per intent or per use case rather than once across the whole bot.

Languageremember

For multi-language deployments, configure one model per language. Multilingual foundation models handle generation natively but classification still needs per-language data.

Refresh cadenceremember

How often the model retrains. Built into automatic features. Calendar manually for static ones.

Fallback behaviorremember

What the feature does below the confidence threshold. Hand off to human, return a default, ask a clarifier. Design this deliberately rather than accepting the default.

Gotchas
  • Switching the underlying model rarely fixes a misroute problem. Rewriting utterances and tuning thresholds almost always does. Spend effort on data, not on the model picker.
  • Language coverage is per-model. A model trained on English utterances misclassifies French ones, even if the underlying foundation model speaks both. Train per language for classification.
  • Confidence threshold tuned once at launch stops being right. Customer language shifts, intent boundaries blur, and the right threshold drifts. Revisit it quarterly.
  • Static NLP models degrade silently. A bot that was 92 percent accurate at launch can be 78 percent a year later and nobody notices until customer complaints pile up.
  • Foundation model and NLP model are not synonyms. Mixing the terms in design docs causes confusion about whether the feature can be fine-tuned or only prompted.

See the full NLP Model entry

NLP Model includes the definition, worked example, deep dive, related terms, and a quiz.