What is the most important tip for working with Dataset Builder?

Dataset Builder is a generator, not a runtime. It writes dataflow JSON for you and then steps out of the picture. Once you understand that, the rest of CRM Analytics dataset management makes sense: every dataset has a dataflow or recipe behind it, even if you never wrote one.

AnalyticsBeginner

Dataset Builder

Q: What does Dataset Builder generate when you save your visual object and field selections?

Dataset Builder generates a dataflow JSON with sfdcDigest and augment steps plus a register step. It does not produce a recipe, an Apex class, or a manual CSV export.

Q: Which fields does Dataset Builder exclude when building a dataset from related objects?

Dataset Builder excludes Long Text Areas over 32 KB and Encrypted fields. Formula, standard, and picklist fields are all included.

Q: How does Dataset Builder traverse relationships when you add related objects to the root?

Dataset Builder follows lookup and master-detail links automatically but not many-to-many junctions, which need the junction object added as an intermediate hop. It does not ignore lookups or require manual SOQL.

Dataset Builder is the visual tool inside CRM Analytics that lets you create a dataset by picking a Salesforce object and dragging in the related objects and fields you want to include.

Hear it

§ 01

Definition

Dataset Builder is the visual tool inside CRM Analytics that lets you create a dataset by picking a Salesforce object and dragging in the related objects and fields you want to include. The output is a generated dataflow JSON definition with the right sfdcDigest and augment steps in place; you save and run it, and the dataset is registered for use in dashboards and lenses. It is the fastest way to produce a new analytical dataset without hand-writing a dataflow.

The builder shows objects as boxes connected by their lookup and master-detail relationships. You pick a root object (Opportunity, for example), then click related objects (Account, Owner, Stage History) to add them, and check the fields you want from each. When you save, CRM Analytics generates the dataflow steps, registers the dataset, and runs the first extract. Future schema changes (added fields, new relationships) require editing the generated dataflow directly or rebuilding through Dataset Builder.

§ 02

What Dataset Builder generates behind the scenes

From visual picks to dataflow JSON

When you save in Dataset Builder, CRM Analytics generates a sequence of sfdcDigest steps (one per object you included) and augment steps (one per relationship traversal). The augments are left-outer by default, matching the dataflow runtime's only join semantics. A final sfdcRegister step writes the combined stream as a dataset named after your root object. You can inspect the generated JSON in the Dataflow Editor and tweak it directly if Dataset Builder's defaults are not quite right.

What you can and cannot include

Dataset Builder follows lookup and master-detail relationships, so any object reachable by those from your root shows up as an addable box. It does not follow many-to-many junctions automatically; you add the junction object as an intermediate step and pick the second hop manually. Picklist values, formula fields, and standard fields all come along. Long Text Areas over 32 KB and Encrypted fields are excluded because CRM Analytics datasets do not store them.

Field naming and the API name dance

The dataset's column names default to the field's API name, not the user-visible label. Account.Industry stays Account.Industry, and custom fields keep their __c suffix. If you want friendlier labels in dashboards, you set them in the dataset's XMD (extended metadata) after the first extract. Dataset Builder does not currently let you alias columns at creation time.

Dataset Builder versus Data Prep recipes

Dataset Builder produces a dataflow, not a recipe, even in modern orgs where Data Prep recipes are the preferred ingestion path. If you want a recipe-based equivalent, build the recipe from scratch using the Data Prep node-based editor and pick objects there. Dataset Builder remains useful for quick prototypes and for orgs still running on dataflows; for new production pipelines, recipes are the recommended target.

Refresh and schedule semantics

After Dataset Builder creates the dataflow, the dataset refreshes on whatever schedule that dataflow runs (hourly, every 3 hours, every 8 hours, or daily). The first run extracts all rows; subsequent runs do a full extract unless you edit the dataflow to use sfdcDigest with complete: false. Most teams switch to incremental sfdcDigest after the prototype proves out, because full extracts on multi-million-row objects burn Bulk API calls.

Field-level security and the dataset

Dataset Builder respects the field-level security of the user who creates the dataflow at design time, but the dataflow itself runs as the System integration user. Hidden fields you cannot see at design time will not be added to the dataflow, but fields you do have access to will load for all dataset users unless you add row-level or column-level security on top. Audit datasets you inherit from another admin before exposing them broadly.

When to use Dataset Builder versus hand-coded dataflows or recipes

Use Dataset Builder when the dataset is a straightforward join of a few related objects and you want it up in 10 minutes. Use a hand-coded dataflow when you need conditional logic, computed fields, or non-standard joins. Use a Data Prep recipe when the source extends beyond Salesforce (S3, Snowflake, BigQuery) or when push-down to a warehouse matters for cost and speed. The three tools coexist; pick the smallest one that solves the problem.

§ 03

How to build a dataset with Dataset Builder

Dataset Builder is the fastest way to produce a new CRM Analytics dataset from Salesforce objects. The visual canvas turns relationships into clickable nodes; you pick objects and fields, save, and the underlying dataflow handles the rest.

Open Data Manager
In CRM Analytics, open Data Manager, then Datasets, then Create Dataset, then Salesforce Object. This launches Dataset Builder for the new dataset.
Pick the root object
Search for and select the primary object (Opportunity for a sales dataset, Case for service, Account for account-centric reporting). The builder loads the object's relationships into the canvas.
Add related objects
Click any related object's plus icon to include it. The canvas grows as you add Opportunities, Accounts, Owners, and so on. You can traverse multiple hops by clicking through each related box.
Pick fields per object
For each object on the canvas, check the fields you want in the dataset. Default selections include the obvious identifiers; add measure fields (Amount, Quantity), date fields (CloseDate), and segmentation fields (StageName, Industry).
Name the dataset and save
Click Create Dataset, provide a name (Opportunities_With_Account is better than New_Dataset_1), and confirm. CRM Analytics generates the dataflow and runs the first extract immediately.
Verify the dataset and schedule the dataflow
Open the new dataset to confirm row counts and fields. Then open the generated dataflow in the Dataflow Editor and set the schedule (hourly, every 3 hours, daily) for ongoing refreshes.

Key options

Root objectremember

The primary object the dataset is built around. Each row in the final dataset corresponds to one root record, with related fields denormalized in.

Related objectsremember

Up to 10 levels of relationships are supported. Practical limit is 4 to 5 hops before performance and complexity make a dataflow rewrite worthwhile.

Field selectionremember

Standard fields, custom fields, formula fields, and picklists are supported. Long Text Areas over 32 KB and Encrypted fields are excluded automatically.

Save behaviorremember

Save creates the dataflow and runs it once. Edits to the dataset definition require rebuilding through Dataset Builder or editing the generated dataflow JSON directly.

Gotchas

Dataset Builder generates left-outer augments, which means rows on the root object without a related record still appear with nulls. If you need inner-join behavior, edit the dataflow and add a filter step after the augment.
Many-to-many junctions are not auto-traversed. Add the junction as an intermediate object, then pick the far side manually.
Subsequent edits to the generated dataflow (added fields, changed filters) do not surface back in Dataset Builder. The builder is one-way: visual creation, then text-only editing.
Dataset Builder creates a dataflow even in orgs where Data Prep is the preferred path. For warehouse push-down or non-Salesforce sources, build a recipe directly instead.

Prefer this walkthrough as its own page? How to Dataset Builder in Salesforce, step by step

Trust & references

Sources

Cross-checked against the following references.

Data setWikipedia

Official documentation

Straight from the source - Salesforce's reference material on Dataset Builder.

Create a Dataset with the Dataset BuilderSalesforce Help
Data Integration in CRM AnalyticsSalesforce Help

Was this entry helpful?

Help us write better definitions. Quick reactions or detailed edit suggestions.

About the Author

Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.

Test your knowledge

Q1. What does Dataset Builder generate when you save your visual object and field selections?

Q2. Which fields does Dataset Builder exclude when building a dataset from related objects?

Q3. How does Dataset Builder traverse relationships when you add related objects to the root?

Discussion

Loading…

Loading discussion…

Back to Dictionary