1M+ records demands deliberate architecture. Patterns ordered roughly by scale.
1. Bulk API 2.0 — the workhorse for one-time loads.
bash sf data create job --object Account --operation insert --file accounts.csv
Behind the scenes: Salesforce processes in chunks, in parallel where possible, with per-row error reporting. 100 MB file size limit per job; multiple jobs for larger.
Pros: Salesforce-native, no code, built-in error handling. Cons: still hits triggers/flows on every record (which can be the bottleneck).
2. Bulk API 1.0 with parallel mode — older but explicit batch sizing. Still useful for fine control.
3. Apex Batch — when load logic is complex (transformations, related-record creation, calculations).
apex public class DataMigration implements Database.Batchable<sObject>, Database.Stateful { public Iterable<sObject> start(BatchableContext bc) { return [SELECT Id FROM Source_Object__c WHERE Migrated__c = false]; } public void execute(BatchableContext bc, List<sObject> scope) { // transform and insert into target } } Database.executeBatch(new DataMigration(), 200);
Pros: full Apex flexibility, error recovery per batch, retryable. Cons: governor limits per batch; slower than Bulk API for raw insert.
4. Suppress automation during load
A "Skip Automation" custom field on the User running the load, checked in every trigger:
apex trigger AccountTrigger on Account (...) { if (UserInfo.getUserType() == 'Standard' && [SELECT Skip_Automation__c FROM User WHERE Id=:UserInfo.getUserId()].Skip_Automation__c) return; // normal logic }
For one-off loads, set the flag, run the load, unset. Avoids trigger overhead.
5. External transformation pipeline
For really large loads (10M+), do the transform outside Salesforce:
Source DB -> Extract -> Mulesoft / Snowflake / Python -> Transform -> Bulk API to Salesforce
Salesforce only sees the final clean data; expensive transformations happen elsewhere.
6. Defer Sharing Calculations
For ownership-changing loads on Private OWD objects, sharing recalc can take hours.
Setup -> Defer Sharing Calculations -> Suspend ... run the load ... Setup -> Resume
Recalc happens once at the end instead of per record.
7. Disable validation rules and triggers temporarily
For load-only scenarios where you trust the data, mass-disable validation rules / triggers:
- Validation rules: deactivate via Metadata API or Setup.
- Triggers: many orgs have a "Trigger_Off__c" flag in a Custom Setting.
Risky — easy to forget to re-enable. Document and use sparingly.
8. Big Objects for archive scenarios
If the load is historical data that won't be queried often, a Big Object stores it cheaply.
9. Index-aware loading
Sort source data so duplicate-key checks hit index efficiently. External Id-based upserts assume the index is in place.
10. Monitor and adjust
Watch sharing recalc, governor limit hits, trigger time. Adjust batch size if performance degrades.
Pre-load checklist:
- Test in Full Sandbox with full volume.
- Backup target objects (export current state).
- Confirm rollback plan.
- Schedule for low-traffic window.
- Notify users of any expected slowness.
- Have a runbook for "if it hits limits, what then".
Post-load:
- Validate row counts.
- Spot-check sample records.
- Confirm sharing recalc completed.
- Re-enable suppressed automation.
- Monitor for downstream issues.
A 1M+ load is more about preparation than execution. Get the prep right; execution is the easy part.
