Salesforce Dictionary - Free Salesforce GlossarySalesforce Dictionary
DictionaryLLarge Language Model
AIIntermediate

Large Language Model

A large language model (LLM) is a neural network trained on enormous text corpora that can generate, summarize, translate, classify, and reason about natural language.

§ 01

Definition

A large language model (LLM) is a neural network trained on enormous text corpora that can generate, summarize, translate, classify, and reason about natural language. Inside Salesforce, LLMs power Agentforce reasoning and responses, Einstein GPT generation features, Prompt Builder outputs, Einstein Copilot conversations, and a growing list of GPT-flavored capabilities across Sales, Service, Marketing, and Commerce Clouds. The LLM is the foundation; the Salesforce platform layer around it (grounding, masking, governance) is what makes it usable on regulated business data.

Salesforce does not train its own foundational LLM. The platform brokers calls to vendor models (Anthropic, OpenAI, Google, plus Salesforce-tuned variants of vendor models) through the Einstein Trust Layer. The Trust Layer enforces data masking, residency, and a no-training-on-customer-data contract that vendor models on their own do not provide. The choice of model per feature is largely a Salesforce-managed decision, with admin overrides available for specific use cases through Model Manager and Bring Your Own LLM.

§ 02

Why the model choice matters less than the grounding around it

What an LLM does and where it lives in Salesforce

An LLM accepts a prompt (text input plus optional context) and returns generated text. The text can be a short response, a structured JSON document, a code snippet, or a long-form draft, depending on the prompt design. In Salesforce, LLMs power the final response synthesis in Agentforce, the draft email in the SDR Agent, the case summary in the Service Coach, the prompt templates in Prompt Builder, the chat in Einstein Copilot, and the reasoning steps in the Atlas Reasoning Engine. The platform layer wraps every call with grounding (action outputs, record data, Knowledge chunks) so the model generates against real business context rather than its training data alone.

The Einstein Trust Layer and why it matters

Vendor LLMs accessed directly come with their own privacy and training terms, which usually do not match enterprise compliance requirements. The Einstein Trust Layer sits between every Salesforce feature and the underlying LLM and adds: PII detection and masking on prompts, data residency enforcement, audit logging of every call, and a contractual no-training-on-customer-data agreement with each vendor. Customers see a unified interface; the vendor model never sees raw customer data and never trains on it. This is the operational difference that makes LLMs usable in production for regulated industries like financial services and healthcare.

Model choice: managed vs Bring Your Own LLM

Most Salesforce features use a Salesforce-managed model choice. The platform picks the right vendor model for the task (a large model for complex reasoning, a faster smaller model for short responses) and switches the choice per release as vendor offerings change. Customers do not pick the model per call; they pick the feature, and the platform picks the model. Bring Your Own LLM lets enterprise customers connect a specific model (Azure OpenAI, an on-premise model, a different vendor) for specific features through the Models API. The trade-off is control vs operational burden; most customers stay with managed and let Salesforce handle vendor selection.

Grounding, retrieval, and the hallucination question

LLMs hallucinate. They generate plausible-sounding text that is not grounded in any real source. Salesforce mitigates this through grounding: action outputs, record data, Data Library chunks, and Knowledge articles are injected as context into the prompt with explicit instructions to base the response on the context. Retrieval-augmented generation (RAG) extends this by querying a Data Library for relevant chunks before the prompt is composed. Hallucination still occurs but at much lower rates than ungrounded generation. The remaining hallucination is best caught by Testing Center test cases that assert forbidden content and by sampled human review of high-impact outputs.

Cost, latency, and the per-call economics

LLM calls cost tokens, and tokens cost money. The Salesforce Trust Layer accounts for token usage per feature and bills back through the Agentforce conversation model or the Einstein generation quota depending on the feature. Latency varies by model: larger models take 2 to 5 seconds for a typical Salesforce response, smaller models can be under 1 second. The cost-latency-quality trade-off is largely abstracted from customers by the platform model-per-feature defaults, but heavy-volume features can benefit from custom model selection or BYOLLM economics for specific use cases.

Where LLMs help and where they hurt

LLMs are excellent at: drafting text in a specified tone, summarizing long content, classifying messages into broad categories, answering questions when grounding is provided. LLMs are bad at: precise numerical reasoning, deterministic record updates without confirmation, anything requiring strict regulatory script adherence, and tasks where the cost of being wrong exceeds the cost of paying a human to do it correctly. The Salesforce design pattern is to use LLMs for the soft parts (drafting, summarizing, classifying) and traditional logic (Apex, Flow, validation rules) for the hard parts (calculations, deterministic updates, audit-critical writes).

Evaluation, monitoring, and the production discipline

An LLM in production needs the same monitoring discipline as any other AI model. The Salesforce platform surfaces token usage, latency, and acceptance rate (was the output kept or revised) per feature. Custom prompts in Prompt Builder should ship with a test set in Testing Center that asserts the structural properties of the output. Weekly review of a random conversation sample catches drift that test sets miss. Underused features should be retired rather than left running; an LLM feature that no one uses is still consuming Trust Layer capacity and counts against the org compliance review surface.

§ 03

How to use LLMs in a Salesforce org without surprises

LLMs in Salesforce are not a feature you toggle on. They are a layer behind every Agentforce, Einstein GPT, and Prompt Builder capability. The work that matters is choosing the right features for your use cases, configuring the Trust Layer correctly, and putting evaluation and monitoring in place before broad rollout.

  1. Inventory the LLM-powered features you plan to use

    List the Agentforce agents, Einstein GPT features, and Prompt Builder templates the team will turn on. Each one is a separate evaluation question.

  2. Configure Trust Layer masking and residency rules

    Setup, Einstein Trust Layer. Confirm PII masking is on for the data types your org handles, and data residency matches your region requirements. These defaults are usually correct but worth verifying explicitly.

  3. Decide between managed and Bring Your Own LLM per feature

    For most features, stay with managed. For features with extreme volume or specific compliance needs, evaluate BYOLLM. The decision is per feature, not org-wide.

  4. Ground every prompt with explicit context

    Custom Prompt Builder templates should always include record data, Data Library chunks, or other grounding context. Ungrounded prompts produce hallucinations at a much higher rate.

  5. Build Testing Center test sets that assert structural properties

    Assert what must appear (specific values, citations), what must not appear (forbidden phrases, competitor names), and what tone the response must hit. Soft expectations for tone, hard expectations for content.

  6. Pilot LLM features for two to four weeks before broad rollout

    Pilot data is the only honest evaluation. Vendor benchmarks rarely match real org performance. Two weeks of pilot data tells you what your users actually experience.

  7. Schedule weekly review of a random output sample

    Pull 50 random LLM outputs per feature per week. Review with the feature owner. Catch drift, hallucination, and tone issues before users complain. This work never ends.

Key options
Model choice (managed vs BYOLLM)remember

Whether the feature uses a Salesforce-managed model selection or a customer-specified vendor model. Trade-off is control vs operational burden.

Trust Layer maskingremember

Which PII categories the Trust Layer masks before sending prompts to the model. Defaults handle common categories; org-specific patterns can be added.

Data residencyremember

Which geographic region processes LLM calls. Critical for GDPR, regional data sovereignty requirements.

Grounding sourcesremember

Which record data, Data Libraries, or Knowledge articles are injected as context into LLM prompts.

Evaluation policyremember

The set of Testing Center expectations and sampling cadence that ensures ongoing output quality.

Gotchas
  • Vendor LLMs accessed directly do not have the no-training-on-customer-data guarantee that the Einstein Trust Layer provides. Going around the Trust Layer is a compliance issue, not a shortcut.
  • Ungrounded prompts produce hallucinations at high rates. Every custom prompt should include explicit grounding context.
  • Vendor benchmarks rarely match real org performance. Pilot with your actual data and your actual users before committing to a feature broadly.
  • LLMs are bad at precise numerical reasoning. Do not use them for financial calculations; route to deterministic Apex or Flow logic for those.
  • Unused LLM features still consume Trust Layer capacity and count against compliance review surface. Retire features that no one uses rather than letting them accumulate.
§

Trust & references

Sources

Cross-checked against the following references.

Official documentation

Straight from the source - Salesforce's reference material on Large Language Model.

Was this entry helpful?
Help us write better definitions. Quick reactions or detailed edit suggestions.

About the Author

Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.

§

Test your knowledge

Q1. What is a Large Language Model?

Q2. How does Salesforce wrap LLM interactions?

Q3. What LLM providers does Salesforce support?

§

Discussion

Loading…

Loading discussion…