Agentforce Data Library
An Agentforce Data Library is a managed knowledge store inside Salesforce that holds unstructured content (Knowledge articles, files, web pages, uploaded documents) and serves it back to Agentforce agents through retrieval-augmented generation.
Definition
An Agentforce Data Library is a managed knowledge store inside Salesforce that holds unstructured content (Knowledge articles, files, web pages, uploaded documents) and serves it back to Agentforce agents through retrieval-augmented generation. The library chunks each piece of content, embeds the chunks into a vector index, and exposes a Data Library Search action that the Atlas Reasoning Engine can call to ground its answers in trusted source material.
Without a Data Library, an Agentforce agent answers from the underlying LLM's training data plus whatever record data the actions pull. That works for transactional questions like order status. It fails for policy or how-to questions where the right answer lives in a PDF or a Knowledge article. The library closes that gap. Each library is scoped to a specific topic or audience, so a Returns library and a Pricing library can coexist on the same agent without bleeding context.
Why a Data Library is the difference between guessing and grounding
Where Data Libraries live in setup
Open Setup, search Data Libraries, and you land on the library list. Each library has a name, a description, a list of content sources, an embedding model, and a status field that tracks indexing progress. The library description is the field the Atlas Reasoning Engine reads when a Data Library Search action is invoked; the engine uses it to decide which library to query if multiple are attached to the same topic. A specific description ("Pricing, discount policy, and promotional rules for North America") outperforms a generic one ("Pricing docs"). Libraries are org-level objects, so multiple agents can attach the same library through their actions.
Content sources and how ingestion works
Four source types are supported: Knowledge articles (filtered by data category, channel, or article type), Salesforce Files (filtered by library or folder), URLs (a sitemap or a list of pages), and direct uploads of PDFs, DOCX, HTML, and Markdown. Each source has its own refresh cadence. Knowledge re-indexes when the article is published. Files re-index when a new version is uploaded. URL sources re-fetch on a schedule (daily by default). Uploads only re-index when you replace the file. Mixing sources in one library is supported and common: a Pricing library might pull Knowledge for policy, Files for the rate card, and a URL for the public pricing page.
Chunking, embedding, and the vector index
Ingestion splits each source document into chunks of around 500 tokens with a 50-token overlap. Each chunk is sent to the configured embedding model (usually the Salesforce-managed model) and the resulting vector is stored in the library index. The original chunk text is stored alongside the vector so it can be returned to the agent at query time. A 50-page PDF typically produces 80 to 120 chunks. The chunking is content-aware: tables, headers, and lists are preserved as boundaries rather than split mid-row. The embedding model choice is fixed per library and cannot be changed after ingestion without a full re-index.
How a Data Library Search action queries the library
A Data Library Search action takes a natural-language query string as input and returns the top N chunks that match. The action embeds the query string with the same model used during ingestion, runs a vector similarity search against the library index, and returns the matching chunk text plus source metadata (article ID, file name, URL). The agent then uses those chunks as grounding for its response. The action's prompt template can be customized to instruct the agent to cite the source URL or article number, which produces the trust footer you see in well-built Agentforce agents.
Library scoping and permissions
Each library has a permission set that controls who can read its content through the agent. The same library can be served behind multiple agents, but the agent's invocation context determines which user permissions apply. A library that contains internal-only pricing should sit behind an internal sales agent, not a public-facing service agent. The permission check happens at query time, so a user who lacks access to the underlying Knowledge article never sees that chunk in the response. This is one of the strongest reasons to use Data Libraries over passing raw documents into a prompt: the security model is enforced.
Re-indexing, freshness, and how to know what is stale
The library list page shows last-indexed timestamps per source and per library. A source that has not re-indexed in 30 days is usually a sign that the underlying source has gone stale (deleted Knowledge category, broken URL, file lock). Manually triggering a re-index is one click but takes 5 to 30 minutes depending on document count. Schedule a weekly re-index for URL sources where the pages change without a clear signal. The "What changed" view shows which chunks were added, removed, or modified in the last re-index, which is useful for auditing policy drift without diffing every source by hand.
Cost, quota, and what counts as a query
Data Library ingestion is billed by token volume embedded. Each Data Library Search action call is billed as a query plus the tokens returned. Most orgs sit well within the Agentforce SKU's included quota, but a chatty agent that calls the search action three times per turn can drain quickly. Watch the Agentforce Usage report for query-per-conversation trends. A spike is almost always a sign the topic's classification description is too broad and the agent is searching the library on irrelevant turns. The fix is upstream in the topic, not in the library itself.
How to stand up a Data Library and wire it into an agent
Standing up a Data Library is mostly a content curation exercise dressed in setup screens. The technical steps take an hour. The work of deciding which Knowledge articles, which files, and which URLs belong in the library takes a week, and skipping that work produces agents that confidently cite the wrong policy.
- Decide the library scope before opening setup
One library per audience and topic combination. Pricing, Returns, Onboarding, Compliance. Resist a single "All Knowledge" library; it makes the engine pick less accurately and exposes content across permission boundaries.
- Open Setup, Data Libraries, New Library
Name the library something descriptive. Write the library description in the same style as an Agent Action instruction: name what is in scope and what is out of scope so the engine picks the right library at query time.
- Add content sources
Pick a source type (Knowledge, Files, URL, Upload). Filter Knowledge by data category, channel, or article type. Filter Files by library or folder. For URLs, provide a sitemap or a list of pages. For uploads, drag files in.
- Trigger initial ingestion and wait
The library status moves from Pending to Indexing to Ready. Initial ingestion of a hundred articles takes 10 to 20 minutes. Larger libraries take proportionally longer.
- Create a Data Library Search Agent Action
Setup, Agent Actions, New Agent Action, Data Library Search. Pick the library you just created. Write the action instruction in terms of the user phrases that should trigger it.
- Attach the action to one or more Agent Topics
Open Agent Builder, find the topic that should query this library, add the action. Test that the action fires on relevant turns in the Conversation Preview.
- Tune the response prompt to include citations
Open the action's response prompt template. Add instructions to cite the source article number or URL in the response. The chunks returned by the action carry that metadata already.
The description is the field the engine reads to pick between multiple libraries on the same topic. Write it like an Agent Action instruction.
The model used to convert chunks and queries into vectors. Salesforce-managed is the default; choice is fixed per library after ingestion.
The list of sources feeding the library. Mix Knowledge, Files, URLs, and uploads as needed.
How often the library re-checks the source for changes. Defaults to daily for URLs and on-publish for Knowledge and Files.
How many matching chunks the search action returns per query. Default is 5. Higher gives the agent more context but costs more tokens.
- The library description is the field that decides which library the engine picks when multiple are attached to a topic. A vague description means the engine picks at random.
- Embedding model choice is fixed per library after ingestion. Changing it requires a full re-index and burns the ingestion quota again.
- Permissions are enforced at query time, but only against the underlying source records. If you bulk-upload a PDF that contains restricted content, every user with access to the library can read it.
- URL sources do not respect robots.txt by default. A public-facing site may block ingestion unexpectedly; check the source error log if a library never finishes indexing.
- Chunking is content-aware but not perfect. Tables that span 30 rows can be split mid-table. Audit a sample of chunks from your largest documents before declaring the library production-ready.
Trust & references
Cross-checked against the following references.
- Agentforce product overviewSalesforce
- Data Libraries referenceSalesforce
Straight from the source - Salesforce's reference material on Agentforce Data Library.
- Agentforce Data LibrariesSalesforce Help
- Data Library Search ActionSalesforce Help
About the Author
Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.
Test your knowledge
Q1. What is a key benefit of Agentforce Data Library for business users?
Q2. How does the Einstein Trust Layer relate to Agentforce Data Library?
Q3. What technology powers Agentforce Data Library in Salesforce?
Discussion
Loading discussion…