Standing up a Data Library is mostly a content curation exercise dressed in setup screens. The technical steps take an hour. The work of deciding which Knowledge articles, which files, and which URLs belong in the library takes a week, and skipping that work produces agents that confidently cite the wrong policy.
- Decide the library scope before opening setup
One library per audience and topic combination. Pricing, Returns, Onboarding, Compliance. Resist a single "All Knowledge" library; it makes the engine pick less accurately and exposes content across permission boundaries.
- Open Setup, Data Libraries, New Library
Name the library something descriptive. Write the library description in the same style as an Agent Action instruction: name what is in scope and what is out of scope so the engine picks the right library at query time.
- Add content sources
Pick a source type (Knowledge, Files, URL, Upload). Filter Knowledge by data category, channel, or article type. Filter Files by library or folder. For URLs, provide a sitemap or a list of pages. For uploads, drag files in.
- Trigger initial ingestion and wait
The library status moves from Pending to Indexing to Ready. Initial ingestion of a hundred articles takes 10 to 20 minutes. Larger libraries take proportionally longer.
- Create a Data Library Search Agent Action
Setup, Agent Actions, New Agent Action, Data Library Search. Pick the library you just created. Write the action instruction in terms of the user phrases that should trigger it.
- Attach the action to one or more Agent Topics
Open Agent Builder, find the topic that should query this library, add the action. Test that the action fires on relevant turns in the Conversation Preview.
- Tune the response prompt to include citations
Open the action's response prompt template. Add instructions to cite the source article number or URL in the response. The chunks returned by the action carry that metadata already.
The description is the field the engine reads to pick between multiple libraries on the same topic. Write it like an Agent Action instruction.
The model used to convert chunks and queries into vectors. Salesforce-managed is the default; choice is fixed per library after ingestion.
The list of sources feeding the library. Mix Knowledge, Files, URLs, and uploads as needed.
How often the library re-checks the source for changes. Defaults to daily for URLs and on-publish for Knowledge and Files.
How many matching chunks the search action returns per query. Default is 5. Higher gives the agent more context but costs more tokens.
- The library description is the field that decides which library the engine picks when multiple are attached to a topic. A vague description means the engine picks at random.
- Embedding model choice is fixed per library after ingestion. Changing it requires a full re-index and burns the ingestion quota again.
- Permissions are enforced at query time, but only against the underlying source records. If you bulk-upload a PDF that contains restricted content, every user with access to the library can read it.
- URL sources do not respect robots.txt by default. A public-facing site may block ingestion unexpectedly; check the source error log if a library never finishes indexing.
- Chunking is content-aware but not perfect. Tables that span 30 rows can be split mid-table. Audit a sample of chunks from your largest documents before declaring the library production-ready.