Setting up RAG in Salesforce usually means configuring grounding in Prompt Builder against a native source like Knowledge or Data Cloud. The steps below cover the native path; the bring-your-own path swaps the retrieval source for a custom Apex callable but the rest is identical.
- Pick the source corpus
Decide what the model should be able to cite from: published Knowledge articles, files in a Content library, records of a specific object, Data Cloud entities, or a mix. Narrower corpora give better retrieval than mixing everything together.
- Define the chunking strategy
For native sources, Salesforce handles chunking by default. For custom content, decide whether to split by heading, paragraph, or fixed token count. Semantic boundaries beat character counts. Test chunk sizes between 200 and 800 tokens for most use cases.
- Ensure the source is indexed
For Knowledge, the article must be published in the right channel and language. For Data Cloud, the Search Index must be configured on the entity. For files, the Content asset must be indexed. Indexing is asynchronous, so check status before testing.
- Wire up the prompt template
In Prompt Builder, add the source as a Resource. Configure the retrieval query and top-k. Reference the retrieved content in the prompt body using merge syntax. Add a citation field to the output schema.
- Preview, test, then gate behind a permission set
Preview with several record IDs covering rich, sparse, and edge-case contexts. Activate the template and grant Use Prompt Template via a permission set, starting with a pilot group.
Retrieval over published Knowledge articles. The native path for Service Cloud Einstein features like Reply Recommendations and Case Summary.
Retrieval over any Data Cloud entity, including unified profiles, ingested external data, and calculated insights. The path for cross-system grounding.
Custom Apex invocable that returns text chunks from any source. Used when retrieval needs custom logic, external vector stores, or domain-specific embeddings.
Combines vector similarity with keyword (BM25) scoring. Better than pure vector for queries that mention specific names, codes, or numbers.
An optional second pass that re-scores the top candidates using a more accurate but slower model. Costs latency, improves precision.
- Retrieval quality bounds answer quality. A perfect model cannot fix the wrong document being retrieved. Test retrieval separately from generation.
- The embedding model and the source content language must match. An English embedding model on Japanese content returns nonsense rankings.
- Stale indices ground confident wrong answers. Reindex on source change, not on a fixed schedule.
- Top-k too high dilutes context. Past 8 chunks the model often ignores the lower-ranked ones anyway. Spend the token budget elsewhere.
- RAG hides retrieval failures behind fluent answers. The model rarely says I could not find anything relevant on its own. The template must force that behavior.