What is the most important tip for working with AI Model?

Define the business decision the model will inform in one sentence before training. If you cannot finish "this model will let us decide X," you are not ready to train.

AIAdvanced

AI Model

Q: Which AI Model type turns text or records into vectors for similarity search?

Embedding models turn text or records into vectors used for similarity search and duplicate detection. Predictive, classification, and recommendation models output scores, categories, and ranked lists instead.

Q: In AI Model training, what does data leakage describe?

Leakage means feeding the model features that were not knowable at prediction time, like a field bumped only after a deal closes. The other options describe security or runtime concerns, not leakage.

Q: Why is plain accuracy often the wrong metric for evaluating an AI Model in an org?

A model that is 90 percent accurate is useless if the 10 percent it misses are the high-priority cases, so the business decision metric matters more. The other statements about Data Cloud, model types, and the Trust Layer are false.

An AI model is a trained mathematical function that takes inputs (data, text, images, audio) and returns predictions, classifications, embeddings, or generated content.

Hear it

§ 01

Definition

An AI model is a trained mathematical function that takes inputs (data, text, images, audio) and returns predictions, classifications, embeddings, or generated content. In the Salesforce context, AI models power Einstein features (case classification, prediction builder, opportunity scoring), Agentforce reasoning, Data Cloud identity resolution, and Tableau forecasting. Some models are Salesforce-managed and shared across the platform; others are customer-trained on the customer's own data; still others are third-party models accessed through brokered connectors.

The model is the artifact that gets trained, evaluated, deployed, monitored, and retired. Each phase has its own concerns. Training needs representative data and protected labels. Evaluation needs metrics that match the business goal, not just academic accuracy. Deployment needs governance around who can call the model and what data flows through. Monitoring catches drift before users complain. Retirement is the part that gets neglected and produces models in production no one remembers training.

§ 02

Why the model is the lifecycle, not just the file

Model types you encounter in Salesforce

Five broad types show up in practice. Predictive models output a probability or numeric score (Opportunity Won probability, Lead Conversion score, Case Resolution Time estimate). Classification models output a category (Case Routing label, Sentiment positive/neutral/negative, Email intent type). Embedding models convert text or records into vectors used for similarity search (Data Library retrieval, duplicate detection). Generative models output text, images, or structured content (Agentforce responses, draft emails, Einstein GPT). Recommendation models output a ranked list (Next Best Action recommendations, Einstein Article Recommendations). Each type has different training data needs and different evaluation metrics.

Where models live: managed, custom, brokered

Salesforce-managed models are trained, hosted, and updated by Salesforce. Examples: the Atlas Reasoning Engine model, Einstein Activity Capture's email classifier, Sales Cloud's standard Opportunity Scoring. Customers use them without seeing the underlying training data or weights. Customer-trained models live on the customer's data. Examples: Prediction Builder predictions, Einstein Discovery story models, Custom Apex prediction calls. The customer owns the training data and the resulting model. Brokered third-party models are called through the Einstein Trust Layer connector framework. The customer pays the third-party vendor for usage and Salesforce handles the connection plumbing.

Training data, labels, and the leakage problem

Training data is the foundation. A model that predicts which opportunities will close needs historical opportunities with known outcomes (won or lost) and the features that were known before the outcome was decided. The biggest training error is data leakage: including features that were only filled in after the outcome was known. A Close Date Update Count field that gets bumped every time a rep edits the deal correlates strongly with closed deals only because reps update closed deals more. The model learns the wrong thing. Auditing the training feature list against the "what was knowable at prediction time" question catches almost all leakage cases.

Evaluation metrics that match the business goal

Academic ML emphasizes accuracy, precision, recall, F1, AUC. Salesforce production use rarely cares about those metrics directly. A case classification model that is 90 percent accurate is useless if the 10 percent it gets wrong are the high-priority cases. The right evaluation metric is the business decision metric: time-to-first-response, case resolution rate, opportunity win rate, agent deflection rate. Build the model evaluation to measure the business metric directly when possible. When it is not possible (the business metric takes weeks to observe), use a proxy that correlates strongly with the business metric and validate the correlation periodically.

Deployment, governance, and the Trust Layer

Deploying a model into a Salesforce org means making it callable from a flow, an Apex method, a Lightning component, or an Agentforce action. Each call path needs governance. The Einstein Trust Layer sits between every model call and the underlying model, masking PII, applying data residency rules, logging the prompt and response, and preventing the model from being trained on the customer's data. The Trust Layer is non-optional for Salesforce-managed and brokered third-party models. For customer-trained models, the equivalent compliance work falls to the customer; treat it as required, not as best practice.

Monitoring, drift, and the retrain decision

Models degrade in production. The underlying world changes (new product launches, seasonal patterns, regulatory changes), the input distribution shifts, and the model trained on last year's data performs worse this quarter than last. Monitoring catches this. The standard pattern is to track prediction accuracy against ground truth on a sliding window, alert when accuracy drops below a threshold, and trigger retraining. Retraining is not free; it costs compute, validation time, and rollout coordination. Most teams retrain on a fixed cadence (monthly or quarterly) plus on-demand when drift is detected, rather than retraining continuously.

Retirement and the model registry

Retirement is the underdiscussed phase. Models accumulate in production orgs. Old Einstein Discovery stories that no one remembers training. Brokered third-party models from vendors that have since been replaced. Custom prediction connections that point to deprecated services. Each one is a security and accuracy liability. A model registry that lists every active model with owner, last training date, performance metrics, and next review date is the operational discipline that prevents accumulation. Salesforce's Model Manager surface helps for Einstein and Agentforce models; for custom models, build the registry yourself.

§ 03

How to ship an AI model into a Salesforce org without future regret

The lifecycle matters more than the model. A great model deployed without a monitoring and retirement plan becomes a future incident. The sequence below makes the operational decisions explicit before training starts, which is where most teams underinvest.

Define the business decision the model will inform
Write one sentence: "this model will let us decide X." If you cannot finish that sentence, you are not ready to train. Examples: route case Y to queue Z, score opportunity W on win probability.
Pick the model type that matches the decision
Predictive for probabilities, classification for categories, embedding for similarity, generative for text, recommendation for ranked lists. The type drives training data needs and evaluation metrics.
Audit training data for leakage
For every feature in the training data, ask: was this value knowable at the moment we want the prediction made? Features filled in after the outcome was decided are leakage and need to be removed.
Pick evaluation metrics that map to the business decision
Skip generic accuracy. Use deflection rate, win rate, resolution time, or the closest direct measurement of the business decision. Validate that the proxy correlates with the metric you actually care about.
Train, evaluate, and pilot before broad rollout
Train in a sandbox or scratch environment. Evaluate on a holdout set. Pilot on a small user or queue subset for two to four weeks before broad rollout. Pilot data is the only honest evaluation.
Wire monitoring before broad rollout
Set up a recurring report that tracks prediction accuracy against ground truth, alerts on threshold breaches, and shows distribution shift in inputs. Without monitoring, drift goes unnoticed until users complain.
Register the model with owner and review cadence
Add the model to a registry (Model Manager for Einstein, your own spreadsheet for custom). List owner, last training date, performance metrics, next review date. The registry is the discipline that prevents the model becoming someone else's problem in two years.

Key options

Model hostingremember

Salesforce-managed, customer-trained, or brokered third-party. Drives governance and cost model.

Trust Layer policiesremember

Masking, residency, and logging rules applied to every model call. Non-optional for managed and brokered; equivalent compliance work required for custom models.

Training cadenceremember

Fixed cadence (monthly, quarterly) plus on-demand retraining on drift detection. Continuous retraining is rare in production.

Evaluation metric setremember

Business decision metric plus a proxy metric for fast iteration. Both should be validated to correlate.

Model ownerremember

Named individual or team responsible for the model's training, evaluation, monitoring, and retirement. Required for registry hygiene.

Gotchas

Data leakage in training features is the single most common cause of models that look great in evaluation and fail in production. Audit every feature for "was this knowable at prediction time."
Academic accuracy metrics rarely match business goals. A 90 percent accurate model that fails on the 10 percent of high-impact cases is worse than a 75 percent model that gets the important ones right.
Drift is not optional. Models trained today perform worse next quarter, and worse still next year. Monitoring is the only way to know when to retrain.
Retired models accumulate in production. Custom prediction connections, old Einstein Discovery stories, and brokered models from former vendors become liabilities. A registry with review dates prevents accumulation.
The Trust Layer is non-optional for Salesforce-managed and brokered models, but does not automatically apply to custom-trained models. Build the equivalent compliance work for custom models explicitly.

Prefer this walkthrough as its own page? How to AI Model in Salesforce, step by step

Trust & references

Sources

Cross-checked against the following references.

Salesforce Einstein overviewSalesforce
Agentforce model usageSalesforce

Official documentation

Straight from the source - Salesforce's reference material on AI Model.

Einstein Platform OverviewSalesforce Help
Einstein Trust LayerSalesforce Help

Keep learning

Hands-on resources to go deeper on AI Model.

Get Smart with Salesforce EinsteinResource ·

Was this entry helpful?

Help us write better definitions. Quick reactions or detailed edit suggestions.

About the Author

Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.

Test your knowledge

Q1. Which AI Model type turns text or records into vectors for similarity search?

Q2. In AI Model training, what does data leakage describe?

Q3. Why is plain accuracy often the wrong metric for evaluating an AI Model in an org?

Discussion

Loading…

Loading discussion…

Back to Dictionary