What is the most important tip for working with Dataflow Step?

A dataflow without an sfdcRegister step produces no output. The runner happily computes 19 chained steps and discards everything if the final register is missing. This is the most common surprise the first time you build a dataflow.

AnalyticsAdvanced

Dataflow Step

Q: Which Dataflow Step action writes the final stream to a registered dataset that dashboards can query?

sfdcRegister is the only action that persists output as a registered dataset. sfdcDigest ingests, augment joins, and computeExpression adds computed columns.

Q: How does the dataflow runtime decide the execution order of Dataflow Steps?

The runtime builds a graph from each step's source parameter and topologically sorts it. Steps are unordered in the JSON, not run strictly top-to-bottom, all-at-once, or alphabetically.

Q: Since 2023, what has Salesforce positioned as the preferred tool for new ingestion work over dataflows?

Data Prep recipes are the node-based successor for new ingestion work. Validation Rules and Workflow Rules are unrelated automation, and Apex batch is not a mandated replacement.

A Dataflow Step is a single node in a CRM Analytics dataflow definition.

Hear it

§ 01

Definition

A Dataflow Step is a single node in a CRM Analytics dataflow definition. It is a JSON snippet that pulls in a source, transforms the rows, and hands the output to the next step in the pipeline. Each step has a unique name, an action (sfdcDigest, augment, filter, computeExpression, sfdcRegister, and so on), and a set of parameters that drive what it does. Steps reference each other by name, building a directed graph that ends in one or more registered datasets.

Dataflows have largely been superseded by Data Prep recipes since 2023 for new ingestion work, but existing dataflows still run on a scheduled or on-demand basis. Steps are defined in the dataflow's JSON and edited either in the JSON editor or the older drag-and-drop visual editor inside CRM Analytics Studio. A typical analytics dataset is built from 5 to 20 dataflow steps wired together.

§ 02

Anatomy of a dataflow step in CRM Analytics

The step types you actually use

The dozen or so action types break into ingestion, transformation, and output. sfdcDigest pulls data from Salesforce objects; edgemart loads from an existing dataset. augment joins two streams on matching keys, similar to a left outer join. computeExpression adds calculated fields per row; computeRelative adds fields based on lag or lead within a partition. filter narrows rows; flatten expands hierarchical relationships into denormalized columns. sliceDataset drops columns. append stacks two compatible streams. sfdcRegister writes the final stream to a registered dataset that dashboards can query.

How steps reference each other

Steps are unordered in the JSON, but the runtime builds an execution graph from each step's source parameter. A step that augments two flows declares left and right sources by name. A step that filters declares a single source. The dataflow runner topologically sorts the graph and runs steps in dependency order, and siblings can run in parallel when their inputs are ready.

sfdcDigest and the Bulk API behind it

sfdcDigest is how raw Salesforce data enters the dataflow. You specify the object and the list of fields, and the runner issues a Bulk API query to pull rows. The complete parameter toggles incremental extraction: if false, only rows changed since the last successful run are pulled, using the SystemModstamp index. Incremental mode dramatically reduces runtime on large objects, but the first run is always a full extract.

Computed columns and SAQL expressions

computeExpression and computeRelative steps add columns whose values are SAQL expressions. SAQL is CRM Analytics's query language; in this context it acts like a row-by-row formula language. You can reference other columns, do arithmetic, format strings, and apply case logic. computeRelative additionally supports lag(), lead(), and partition functions for time-series math such as running totals and previous-row values.

Augmenting and the left-outer semantics

augment performs the dataflow equivalent of a left outer join. The right stream's rows are matched against the left by a configured key, and the right's fields are added to the left's rows with a configurable prefix. If a left row has no matching right row, the added fields are null. There is no right-outer or inner-join action: to get inner-join semantics you augment, then filter for non-null on the join key.

Registering the output dataset

A dataflow only persists output when an sfdcRegister step runs. Each register step declares a target dataset name, an alias, and optional sharing rules. A single dataflow can register multiple datasets, which is how teams build a star schema (one fact dataset and several dimension datasets) in one pipeline. The register step is also where you set row-level security predicates that the front-end then enforces on dashboards and lenses.

Scheduling, monitoring, and the move to Recipes

Dataflows run on a configurable schedule (every 1, 3, 8, 12, or 24 hours) or on demand from the Data Manager. Each run produces a job log with per-step timing and row counts, surfaced in the Job Monitor. Since 2023, Salesforce has been pushing teams toward Data Prep recipes: a node-based visual editor with overlapping capabilities, native push-down to Snowflake and BigQuery, and clearer transformation semantics. New ingestion work belongs in Recipes; mature dataflows are still supported but not getting new features.

§ 03

How to add a dataflow step to an existing CRM Analytics dataflow

You edit dataflow steps either in the Dataflow Editor's visual canvas or in the JSON editor. The JSON view is faster once you know the action vocabulary; the visual editor catches reference errors before you save.

Open the Dataflow Editor
In CRM Analytics, go to Data Manager, then Dataflows and Recipes, pick your dataflow, then Edit. The canvas opens with each existing step as a node.
Add a new step
Click the action you need from the left rail (Filter, Augment, Compute Expression, etc.) and drop it onto the canvas. The editor inserts an empty step skeleton with placeholder parameters.
Connect the source
Drag from the previous step's output port to the new step's input. The editor wires the source parameter for you. For augment, drag two source connections: one for the left stream and one for the right.
Configure parameters
Open the step and fill in its action-specific parameters. computeExpression needs a column name, a type, and a SAQL expression. filter needs a saqlFilter clause. augment needs the join keys and a prefix for the added columns.
Validate and save
Click Update Dataflow. The editor validates the JSON, surfaces missing references, and warns about unregistered outputs. Fix any red flags before continuing.
Run on demand to verify
From the Data Manager, click Run Now next to the dataflow. Watch the Job Monitor for per-step row counts and timing. Confirm your new step's output looks right before relying on it in a dashboard.

Key options

sfdcDigestremember

Pulls rows from a Salesforce object via the Bulk API. Use complete: false for incremental loads on large tables.

augmentremember

Left-outer join of two streams on configurable keys. Adds the right stream's fields to the left with a prefix.

computeExpressionremember

Adds a calculated column whose value is a SAQL expression evaluated per row. Equivalent to a formula field at the dataflow layer.

filterremember

Narrows rows using a saqlFilter clause. Runs after augments and computes, so you can filter on joined or derived fields.

sfdcRegisterremember

Persists the output of a stream as a dataset. Includes optional row-level security predicates and dataset sharing.

Gotchas

Dataflows are deprecated for new ingestion since 2023. Build new pipelines in Data Prep Recipes unless you are extending an existing dataflow.
sfdcDigest queries count against your Bulk API limits. A nightly full extract on a multi-million-row object can dominate your daily Bulk usage.
Incremental sfdcDigest relies on SystemModstamp. Records updated by background jobs that suppress SystemModstamp will be missed by incremental runs.
Filtering happens on whatever fields exist at that point in the graph. Place filters after augments and computes if you need to filter on joined or derived columns.

Prefer this walkthrough as its own page? How to Dataflow Step in Salesforce, step by step

Trust & references

Sources

Cross-checked against the following references.

Data-flow diagramWikipedia

Official documentation

Straight from the source - Salesforce's reference material on Dataflow Step.

Create a DataflowSalesforce Help
Dataflow Transformations ReferenceSalesforce Help
Recipes vs. DataflowsSalesforce Help

Keep learning

Hands-on resources to go deeper on Dataflow Step.

Data Integration in CRM AnalyticsDocs ·

Was this entry helpful?

Help us write better definitions. Quick reactions or detailed edit suggestions.

About the Author

Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.

Test your knowledge

Q1. Which Dataflow Step action writes the final stream to a registered dataset that dashboards can query?

Q2. How does the dataflow runtime decide the execution order of Dataflow Steps?

Q3. Since 2023, what has Salesforce positioned as the preferred tool for new ingestion work over dataflows?

Discussion

Loading…

Loading discussion…

Back to Dictionary