What are the patterns for handling bulk data loads (1M+ records) into Salesforce?

1M+ records demands deliberate architecture. Patterns ordered roughly by scale.

1. Bulk API 2.0 — the workhorse for one-time loads.

bash sf data create job --object Account --operation insert --file accounts.csv

Behind the scenes: Salesforce processes in chunks, in parallel where possible, with per-row error reporting. 100 MB file size limit per job; multiple jobs for larger.

Pros: Salesforce-native, no code, built-in error handling. Cons: still hits triggers/flows on every record (which can be the bottleneck).

2. Bulk API 1.0 with parallel mode — older but explicit batch sizing. Still useful for fine control.

3. Apex Batch — when load logic is complex (transformations, related-record creation, calculations).

apex public class DataMigration implements Database.Batchable<sObject>, Database.Stateful { public Iterable<sObject> start(BatchableContext bc) { return [SELECT Id FROM Source_Object__c WHERE Migrated__c = false]; } public void execute(BatchableContext bc, List<sObject> scope) { // transform and insert into target } } Database.executeBatch(new DataMigration(), 200);

Pros: full Apex flexibility, error recovery per batch, retryable. Cons: governor limits per batch; slower than Bulk API for raw insert.

4. Suppress automation during load

A "Skip Automation" custom field on the User running the load, checked in every trigger:

apex trigger AccountTrigger on Account (...) { if (UserInfo.getUserType() == 'Standard' && [SELECT Skip_Automation__c FROM User WHERE Id=:UserInfo.getUserId()].Skip_Automation__c) return; // normal logic }

For one-off loads, set the flag, run the load, unset. Avoids trigger overhead.

5. External transformation pipeline

For really large loads (10M+), do the transform outside Salesforce:

Source DB -> Extract -> Mulesoft / Snowflake / Python -> Transform -> Bulk API to Salesforce

Salesforce only sees the final clean data; expensive transformations happen elsewhere.

6. Defer Sharing Calculations

For ownership-changing loads on Private OWD objects, sharing recalc can take hours.

Setup -> Defer Sharing Calculations -> Suspend ... run the load ... Setup -> Resume

Recalc happens once at the end instead of per record.

7. Disable validation rules and triggers temporarily

For load-only scenarios where you trust the data, mass-disable validation rules / triggers:

Validation rules: deactivate via Metadata API or Setup.
Triggers: many orgs have a "Trigger_Off__c" flag in a Custom Setting.

Risky — easy to forget to re-enable. Document and use sparingly.

8. Big Objects for archive scenarios

If the load is historical data that won't be queried often, a Big Object stores it cheaply.

9. Index-aware loading

Sort source data so duplicate-key checks hit index efficiently. External Id-based upserts assume the index is in place.

10. Monitor and adjust

Watch sharing recalc, governor limit hits, trigger time. Adjust batch size if performance degrades.

Pre-load checklist:

Test in Full Sandbox with full volume.
Backup target objects (export current state).
Confirm rollback plan.
Schedule for low-traffic window.
Notify users of any expected slowness.
Have a runbook for "if it hits limits, what then".

Post-load:

Validate row counts.
Spot-check sample records.
Confirm sharing recalc completed.
Re-enable suppressed automation.
Monitor for downstream issues.

A 1M+ load is more about preparation than execution. Get the prep right; execution is the easy part.

What are the patterns for handling bulk data loads (1M+ records) into Salesforce?

Why this answer works

Follow-ups to expect

Related dictionary terms