Dataflow Step
A Dataflow Step is a single node in a CRM Analytics dataflow definition.
Definition
A Dataflow Step is a single node in a CRM Analytics dataflow definition. It is a JSON snippet that pulls in a source, transforms the rows, and hands the output to the next step in the pipeline. Each step has a unique name, an action (sfdcDigest, augment, filter, computeExpression, sfdcRegister, and so on), and a set of parameters that drive what it does. Steps reference each other by name, building a directed graph that ends in one or more registered datasets.
Dataflows have largely been superseded by Data Prep recipes since 2023 for new ingestion work, but existing dataflows still run on a scheduled or on-demand basis. Steps are defined in the dataflow's JSON and edited either in the JSON editor or the older drag-and-drop visual editor inside CRM Analytics Studio. A typical analytics dataset is built from 5 to 20 dataflow steps wired together.
Anatomy of a dataflow step in CRM Analytics
The step types you actually use
The dozen or so action types break into ingestion, transformation, and output. sfdcDigest pulls data from Salesforce objects; edgemart loads from an existing dataset. augment joins two streams on matching keys, similar to a left outer join. computeExpression adds calculated fields per row; computeRelative adds fields based on lag or lead within a partition. filter narrows rows; flatten expands hierarchical relationships into denormalized columns. sliceDataset drops columns. append stacks two compatible streams. sfdcRegister writes the final stream to a registered dataset that dashboards can query.
How steps reference each other
Steps are unordered in the JSON, but the runtime builds an execution graph from each step's source parameter. A step that augments two flows declares left and right sources by name. A step that filters declares a single source. The dataflow runner topologically sorts the graph and runs steps in dependency order, and siblings can run in parallel when their inputs are ready.
sfdcDigest and the Bulk API behind it
sfdcDigest is how raw Salesforce data enters the dataflow. You specify the object and the list of fields, and the runner issues a Bulk API query to pull rows. The complete parameter toggles incremental extraction: if false, only rows changed since the last successful run are pulled, using the SystemModstamp index. Incremental mode dramatically reduces runtime on large objects, but the first run is always a full extract.
Computed columns and SAQL expressions
computeExpression and computeRelative steps add columns whose values are SAQL expressions. SAQL is CRM Analytics's query language; in this context it acts like a row-by-row formula language. You can reference other columns, do arithmetic, format strings, and apply case logic. computeRelative additionally supports lag(), lead(), and partition functions for time-series math such as running totals and previous-row values.
Augmenting and the left-outer semantics
augment performs the dataflow equivalent of a left outer join. The right stream's rows are matched against the left by a configured key, and the right's fields are added to the left's rows with a configurable prefix. If a left row has no matching right row, the added fields are null. There is no right-outer or inner-join action: to get inner-join semantics you augment, then filter for non-null on the join key.
Registering the output dataset
A dataflow only persists output when an sfdcRegister step runs. Each register step declares a target dataset name, an alias, and optional sharing rules. A single dataflow can register multiple datasets, which is how teams build a star schema (one fact dataset and several dimension datasets) in one pipeline. The register step is also where you set row-level security predicates that the front-end then enforces on dashboards and lenses.
Scheduling, monitoring, and the move to Recipes
Dataflows run on a configurable schedule (every 1, 3, 8, 12, or 24 hours) or on demand from the Data Manager. Each run produces a job log with per-step timing and row counts, surfaced in the Job Monitor. Since 2023, Salesforce has been pushing teams toward Data Prep recipes: a node-based visual editor with overlapping capabilities, native push-down to Snowflake and BigQuery, and clearer transformation semantics. New ingestion work belongs in Recipes; mature dataflows are still supported but not getting new features.
How to add a dataflow step to an existing CRM Analytics dataflow
You edit dataflow steps either in the Dataflow Editor's visual canvas or in the JSON editor. The JSON view is faster once you know the action vocabulary; the visual editor catches reference errors before you save.
- Open the Dataflow Editor
In CRM Analytics, go to Data Manager, then Dataflows and Recipes, pick your dataflow, then Edit. The canvas opens with each existing step as a node.
- Add a new step
Click the action you need from the left rail (Filter, Augment, Compute Expression, etc.) and drop it onto the canvas. The editor inserts an empty step skeleton with placeholder parameters.
- Connect the source
Drag from the previous step's output port to the new step's input. The editor wires the source parameter for you. For augment, drag two source connections: one for the left stream and one for the right.
- Configure parameters
Open the step and fill in its action-specific parameters. computeExpression needs a column name, a type, and a SAQL expression. filter needs a saqlFilter clause. augment needs the join keys and a prefix for the added columns.
- Validate and save
Click Update Dataflow. The editor validates the JSON, surfaces missing references, and warns about unregistered outputs. Fix any red flags before continuing.
- Run on demand to verify
From the Data Manager, click Run Now next to the dataflow. Watch the Job Monitor for per-step row counts and timing. Confirm your new step's output looks right before relying on it in a dashboard.
Pulls rows from a Salesforce object via the Bulk API. Use complete: false for incremental loads on large tables.
Left-outer join of two streams on configurable keys. Adds the right stream's fields to the left with a prefix.
Adds a calculated column whose value is a SAQL expression evaluated per row. Equivalent to a formula field at the dataflow layer.
Narrows rows using a saqlFilter clause. Runs after augments and computes, so you can filter on joined or derived fields.
Persists the output of a stream as a dataset. Includes optional row-level security predicates and dataset sharing.
- Dataflows are deprecated for new ingestion since 2023. Build new pipelines in Data Prep Recipes unless you are extending an existing dataflow.
- sfdcDigest queries count against your Bulk API limits. A nightly full extract on a multi-million-row object can dominate your daily Bulk usage.
- Incremental sfdcDigest relies on SystemModstamp. Records updated by background jobs that suppress SystemModstamp will be missed by incremental runs.
- Filtering happens on whatever fields exist at that point in the graph. Place filters after augments and computes if you need to filter on joined or derived columns.
Trust & references
Straight from the source - Salesforce's reference material on Dataflow Step.
- Create a DataflowSalesforce Help
- Dataflow Transformations ReferenceSalesforce Help
- Recipes vs. DataflowsSalesforce Help
Hands-on resources to go deeper on Dataflow Step.
About the Author
Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.
Test your knowledge
Q1. What is a Dataflow Step?
Q2. What's the modern alternative to dataflows in CRM Analytics?
Q3. Why break a dataflow into small steps?
Discussion
Loading discussion…