What is the most important tip for working with Large Language Model?

The Einstein Trust Layer is the operational difference that makes LLMs usable on regulated data. Configure it explicitly per data type rather than relying on defaults.

AIIntermediate

Large Language Model

Q: Within the Salesforce platform, what is the role of a large language model (LLM) in features like Agentforce and Prompt Builder?

The LLM is the text-generating neural network that powers response synthesis, summaries, and classification in those features. It is not the database, the sharing engine, or a static analyzer; those are separate platform layers.

Q: How does Salesforce ground an LLM call so the model answers from real business context rather than its training data alone?

Grounding injects record data, action outputs, and Knowledge or Data Library chunks into the prompt so generation is anchored to real context. Salesforce contractually does not train vendor models on customer data, and Apex triggers and validation rules are unrelated to grounding.

Q: What layer sits between every Salesforce feature and the underlying LLM to mask PII, enforce data residency, and audit-log each model call?

The Einstein Trust Layer masks PII, enforces residency, and audit-logs every LLM call. Bulk API handles data loads, Lightning Web Security isolates client JavaScript, and Shield encrypts stored data, none of which wrap LLM interactions.

A large language model (LLM) is a neural network trained on enormous text corpora that can generate, summarize, translate, classify, and reason about natural language.

Hear it

§ 01

Definition

A large language model (LLM) is a neural network trained on enormous text corpora that can generate, summarize, translate, classify, and reason about natural language. Inside Salesforce, LLMs power Agentforce reasoning and responses, Einstein GPT generation features, Prompt Builder outputs, Einstein Copilot conversations, and a growing list of GPT-flavored capabilities across Sales, Service, Marketing, and Commerce Clouds. The LLM is the foundation; the Salesforce platform layer around it (grounding, masking, governance) is what makes it usable on regulated business data.

Salesforce does not train its own foundational LLM. The platform brokers calls to vendor models (Anthropic, OpenAI, Google, plus Salesforce-tuned variants of vendor models) through the Einstein Trust Layer. The Trust Layer enforces data masking, residency, and a no-training-on-customer-data contract that vendor models on their own do not provide. The choice of model per feature is largely a Salesforce-managed decision, with admin overrides available for specific use cases through Model Manager and Bring Your Own LLM.

§ 02

Why the model choice matters less than the grounding around it

What an LLM does and where it lives in Salesforce

An LLM accepts a prompt (text input plus optional context) and returns generated text. The text can be a short response, a structured JSON document, a code snippet, or a long-form draft, depending on the prompt design. In Salesforce, LLMs power the final response synthesis in Agentforce, the draft email in the SDR Agent, the case summary in the Service Coach, the prompt templates in Prompt Builder, the chat in Einstein Copilot, and the reasoning steps in the Atlas Reasoning Engine. The platform layer wraps every call with grounding (action outputs, record data, Knowledge chunks) so the model generates against real business context rather than its training data alone.

The Einstein Trust Layer and why it matters

Vendor LLMs accessed directly come with their own privacy and training terms, which usually do not match enterprise compliance requirements. The Einstein Trust Layer sits between every Salesforce feature and the underlying LLM and adds: PII detection and masking on prompts, data residency enforcement, audit logging of every call, and a contractual no-training-on-customer-data agreement with each vendor. Customers see a unified interface; the vendor model never sees raw customer data and never trains on it. This is the operational difference that makes LLMs usable in production for regulated industries like financial services and healthcare.

Model choice: managed vs Bring Your Own LLM

Most Salesforce features use a Salesforce-managed model choice. The platform picks the right vendor model for the task (a large model for complex reasoning, a faster smaller model for short responses) and switches the choice per release as vendor offerings change. Customers do not pick the model per call; they pick the feature, and the platform picks the model. Bring Your Own LLM lets enterprise customers connect a specific model (Azure OpenAI, an on-premise model, a different vendor) for specific features through the Models API. The trade-off is control vs operational burden; most customers stay with managed and let Salesforce handle vendor selection.

Grounding, retrieval, and the hallucination question

LLMs hallucinate. They generate plausible-sounding text that is not grounded in any real source. Salesforce mitigates this through grounding: action outputs, record data, Data Library chunks, and Knowledge articles are injected as context into the prompt with explicit instructions to base the response on the context. Retrieval-augmented generation (RAG) extends this by querying a Data Library for relevant chunks before the prompt is composed. Hallucination still occurs but at much lower rates than ungrounded generation. The remaining hallucination is best caught by Testing Center test cases that assert forbidden content and by sampled human review of high-impact outputs.

Cost, latency, and the per-call economics

LLM calls cost tokens, and tokens cost money. The Salesforce Trust Layer accounts for token usage per feature and bills back through the Agentforce conversation model or the Einstein generation quota depending on the feature. Latency varies by model: larger models take 2 to 5 seconds for a typical Salesforce response, smaller models can be under 1 second. The cost-latency-quality trade-off is largely abstracted from customers by the platform model-per-feature defaults, but heavy-volume features can benefit from custom model selection or BYOLLM economics for specific use cases.

Where LLMs help and where they hurt

LLMs are excellent at: drafting text in a specified tone, summarizing long content, classifying messages into broad categories, answering questions when grounding is provided. LLMs are bad at: precise numerical reasoning, deterministic record updates without confirmation, anything requiring strict regulatory script adherence, and tasks where the cost of being wrong exceeds the cost of paying a human to do it correctly. The Salesforce design pattern is to use LLMs for the soft parts (drafting, summarizing, classifying) and traditional logic (Apex, Flow, validation rules) for the hard parts (calculations, deterministic updates, audit-critical writes).

Evaluation, monitoring, and the production discipline

An LLM in production needs the same monitoring discipline as any other AI model. The Salesforce platform surfaces token usage, latency, and acceptance rate (was the output kept or revised) per feature. Custom prompts in Prompt Builder should ship with a test set in Testing Center that asserts the structural properties of the output. Weekly review of a random conversation sample catches drift that test sets miss. Underused features should be retired rather than left running; an LLM feature that no one uses is still consuming Trust Layer capacity and counts against the org compliance review surface.

§ 03

How to use LLMs in a Salesforce org without surprises

LLMs in Salesforce are not a feature you toggle on. They are a layer behind every Agentforce, Einstein GPT, and Prompt Builder capability. The work that matters is choosing the right features for your use cases, configuring the Trust Layer correctly, and putting evaluation and monitoring in place before broad rollout.

Inventory the LLM-powered features you plan to use
List the Agentforce agents, Einstein GPT features, and Prompt Builder templates the team will turn on. Each one is a separate evaluation question.
Configure Trust Layer masking and residency rules
Setup, Einstein Trust Layer. Confirm PII masking is on for the data types your org handles, and data residency matches your region requirements. These defaults are usually correct but worth verifying explicitly.
Decide between managed and Bring Your Own LLM per feature
For most features, stay with managed. For features with extreme volume or specific compliance needs, evaluate BYOLLM. The decision is per feature, not org-wide.
Ground every prompt with explicit context
Custom Prompt Builder templates should always include record data, Data Library chunks, or other grounding context. Ungrounded prompts produce hallucinations at a much higher rate.
Build Testing Center test sets that assert structural properties
Assert what must appear (specific values, citations), what must not appear (forbidden phrases, competitor names), and what tone the response must hit. Soft expectations for tone, hard expectations for content.
Pilot LLM features for two to four weeks before broad rollout
Pilot data is the only honest evaluation. Vendor benchmarks rarely match real org performance. Two weeks of pilot data tells you what your users actually experience.
Schedule weekly review of a random output sample
Pull 50 random LLM outputs per feature per week. Review with the feature owner. Catch drift, hallucination, and tone issues before users complain. This work never ends.

Key options

Model choice (managed vs BYOLLM)remember

Whether the feature uses a Salesforce-managed model selection or a customer-specified vendor model. Trade-off is control vs operational burden.

Trust Layer maskingremember

Which PII categories the Trust Layer masks before sending prompts to the model. Defaults handle common categories; org-specific patterns can be added.

Data residencyremember

Which geographic region processes LLM calls. Critical for GDPR, regional data sovereignty requirements.

Grounding sourcesremember

Which record data, Data Libraries, or Knowledge articles are injected as context into LLM prompts.

Evaluation policyremember

The set of Testing Center expectations and sampling cadence that ensures ongoing output quality.

Gotchas

Vendor LLMs accessed directly do not have the no-training-on-customer-data guarantee that the Einstein Trust Layer provides. Going around the Trust Layer is a compliance issue, not a shortcut.
Ungrounded prompts produce hallucinations at high rates. Every custom prompt should include explicit grounding context.
Vendor benchmarks rarely match real org performance. Pilot with your actual data and your actual users before committing to a feature broadly.
LLMs are bad at precise numerical reasoning. Do not use them for financial calculations; route to deterministic Apex or Flow logic for those.
Unused LLM features still consume Trust Layer capacity and count against compliance review surface. Retire features that no one uses rather than letting them accumulate.

Prefer this walkthrough as its own page? How to Large Language Model in Salesforce, step by step

Trust & references

Sources

Cross-checked against the following references.

Salesforce AI overviewSalesforce

Official documentation

Straight from the source - Salesforce's reference material on Large Language Model.

Einstein Generative AI OverviewSalesforce Help
Einstein Trust LayerSalesforce Help
Bring Your Own LLM (BYOLLM)Salesforce Help

Was this entry helpful?

Help us write better definitions. Quick reactions or detailed edit suggestions.

About the Author

Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.

Test your knowledge

Q1. Within the Salesforce platform, what is the role of a large language model (LLM) in features like Agentforce and Prompt Builder?

Q2. How does Salesforce ground an LLM call so the model answers from real business context rather than its training data alone?

Q3. What layer sits between every Salesforce feature and the underlying LLM to mask PII, enforce data residency, and audit-log each model call?

Discussion

Loading…

Loading discussion…

Back to Dictionary