Dataset Builder
Dataset Builder is the visual tool inside CRM Analytics that lets you create a dataset by picking a Salesforce object and dragging in the related objects and fields you want to include.
Definition
Dataset Builder is the visual tool inside CRM Analytics that lets you create a dataset by picking a Salesforce object and dragging in the related objects and fields you want to include. The output is a generated dataflow JSON definition with the right sfdcDigest and augment steps in place; you save and run it, and the dataset is registered for use in dashboards and lenses. It is the fastest way to produce a new analytical dataset without hand-writing a dataflow.
The builder shows objects as boxes connected by their lookup and master-detail relationships. You pick a root object (Opportunity, for example), then click related objects (Account, Owner, Stage History) to add them, and check the fields you want from each. When you save, CRM Analytics generates the dataflow steps, registers the dataset, and runs the first extract. Future schema changes (added fields, new relationships) require editing the generated dataflow directly or rebuilding through Dataset Builder.
What Dataset Builder generates behind the scenes
From visual picks to dataflow JSON
When you save in Dataset Builder, CRM Analytics generates a sequence of sfdcDigest steps (one per object you included) and augment steps (one per relationship traversal). The augments are left-outer by default, matching the dataflow runtime's only join semantics. A final sfdcRegister step writes the combined stream as a dataset named after your root object. You can inspect the generated JSON in the Dataflow Editor and tweak it directly if Dataset Builder's defaults are not quite right.
What you can and cannot include
Dataset Builder follows lookup and master-detail relationships, so any object reachable by those from your root shows up as an addable box. It does not follow many-to-many junctions automatically; you add the junction object as an intermediate step and pick the second hop manually. Picklist values, formula fields, and standard fields all come along. Long Text Areas over 32 KB and Encrypted fields are excluded because CRM Analytics datasets do not store them.
Field naming and the API name dance
The dataset's column names default to the field's API name, not the user-visible label. Account.Industry stays Account.Industry, and custom fields keep their __c suffix. If you want friendlier labels in dashboards, you set them in the dataset's XMD (extended metadata) after the first extract. Dataset Builder does not currently let you alias columns at creation time.
Dataset Builder versus Data Prep recipes
Dataset Builder produces a dataflow, not a recipe, even in modern orgs where Data Prep recipes are the preferred ingestion path. If you want a recipe-based equivalent, build the recipe from scratch using the Data Prep node-based editor and pick objects there. Dataset Builder remains useful for quick prototypes and for orgs still running on dataflows; for new production pipelines, recipes are the recommended target.
Refresh and schedule semantics
After Dataset Builder creates the dataflow, the dataset refreshes on whatever schedule that dataflow runs (hourly, every 3 hours, every 8 hours, or daily). The first run extracts all rows; subsequent runs do a full extract unless you edit the dataflow to use sfdcDigest with complete: false. Most teams switch to incremental sfdcDigest after the prototype proves out, because full extracts on multi-million-row objects burn Bulk API calls.
Field-level security and the dataset
Dataset Builder respects the field-level security of the user who creates the dataflow at design time, but the dataflow itself runs as the System integration user. Hidden fields you cannot see at design time will not be added to the dataflow, but fields you do have access to will load for all dataset users unless you add row-level or column-level security on top. Audit datasets you inherit from another admin before exposing them broadly.
When to use Dataset Builder versus hand-coded dataflows or recipes
Use Dataset Builder when the dataset is a straightforward join of a few related objects and you want it up in 10 minutes. Use a hand-coded dataflow when you need conditional logic, computed fields, or non-standard joins. Use a Data Prep recipe when the source extends beyond Salesforce (S3, Snowflake, BigQuery) or when push-down to a warehouse matters for cost and speed. The three tools coexist; pick the smallest one that solves the problem.
How to build a dataset with Dataset Builder
Dataset Builder is the fastest way to produce a new CRM Analytics dataset from Salesforce objects. The visual canvas turns relationships into clickable nodes; you pick objects and fields, save, and the underlying dataflow handles the rest.
- Open Data Manager
In CRM Analytics, open Data Manager, then Datasets, then Create Dataset, then Salesforce Object. This launches Dataset Builder for the new dataset.
- Pick the root object
Search for and select the primary object (Opportunity for a sales dataset, Case for service, Account for account-centric reporting). The builder loads the object's relationships into the canvas.
- Add related objects
Click any related object's plus icon to include it. The canvas grows as you add Opportunities, Accounts, Owners, and so on. You can traverse multiple hops by clicking through each related box.
- Pick fields per object
For each object on the canvas, check the fields you want in the dataset. Default selections include the obvious identifiers; add measure fields (Amount, Quantity), date fields (CloseDate), and segmentation fields (StageName, Industry).
- Name the dataset and save
Click Create Dataset, provide a name (Opportunities_With_Account is better than New_Dataset_1), and confirm. CRM Analytics generates the dataflow and runs the first extract immediately.
- Verify the dataset and schedule the dataflow
Open the new dataset to confirm row counts and fields. Then open the generated dataflow in the Dataflow Editor and set the schedule (hourly, every 3 hours, daily) for ongoing refreshes.
The primary object the dataset is built around. Each row in the final dataset corresponds to one root record, with related fields denormalized in.
Up to 10 levels of relationships are supported. Practical limit is 4 to 5 hops before performance and complexity make a dataflow rewrite worthwhile.
Standard fields, custom fields, formula fields, and picklists are supported. Long Text Areas over 32 KB and Encrypted fields are excluded automatically.
Save creates the dataflow and runs it once. Edits to the dataset definition require rebuilding through Dataset Builder or editing the generated dataflow JSON directly.
- Dataset Builder generates left-outer augments, which means rows on the root object without a related record still appear with nulls. If you need inner-join behavior, edit the dataflow and add a filter step after the augment.
- Many-to-many junctions are not auto-traversed. Add the junction as an intermediate object, then pick the far side manually.
- Subsequent edits to the generated dataflow (added fields, changed filters) do not surface back in Dataset Builder. The builder is one-way: visual creation, then text-only editing.
- Dataset Builder creates a dataflow even in orgs where Data Prep is the preferred path. For warehouse push-down or non-Salesforce sources, build a recipe directly instead.
Trust & references
Straight from the source - Salesforce's reference material on Dataset Builder.
- Create a Dataset with the Dataset BuilderSalesforce Help
- Data Integration in CRM AnalyticsSalesforce Help
About the Author
Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.
Test your knowledge
Q1. What is Dataset Builder?
Q2. When is Dataset Builder the right choice?
Q3. Who is Dataset Builder primarily aimed at?
Discussion
Loading discussion…