How would you architect a long-running Apex job processing 10M records?

10M records is firmly in Batch Apex territory. You can't process it in a single transaction.

Architecture:

Implement `Database.Batchable<sObject>`. Three methods: start, execute, finish.

`apex global class BulkProcessor implements Database.Batchable<sObject>, Database.Stateful { global Integer recordsProcessed = 0;

global Database.QueryLocator start(Database.BatchableContext bc) { return Database.getQueryLocator([SELECT Id, Field__c FROM TargetObject__c WHERE Status='Pending']); }

global void execute(Database.BatchableContext bc, List<TargetObject__c> scope) { for (TargetObject__c rec : scope) { // process } update scope; recordsProcessed += scope.size(); }

global void finish(Database.BatchableContext bc) { // Send completion email, log results, optionally chain another Batch } } `

Pick batch size carefully. Database.executeBatch(new BulkProcessor(), 200) — 200 is the default max. Smaller (50, 100) reduces governor pressure per execute() but increases total batches.

Use `Database.QueryLocator` in start() for >50k records — it iterates lazily.

`Database.Stateful` marker preserves instance variables across execute() calls — useful for accumulating totals or referencing a master record set.

Bulk-pattern within `execute()` — same rules: no SOQL or DML in loops; build collections; one DML per chunk.

Error handling per-record:

apex Database.SaveResult[] results = Database.update(scope, false); List<Failed_Record__c> failures = new List<Failed_Record__c>(); for (Integer i = 0; i < results.size(); i++) { if (!results[i].isSuccess()) { failures.add(new Failed_Record__c(...)); } } if (!failures.isEmpty()) insert failures;

Recovery strategy. If a batch fails mid-flight, your start() query should be re-runnable — it should target only records that haven't been processed yet (e.g., WHERE Status='Pending' plus update Status to 'Processed' in execute).

Schedule it. Use System.schedule() or Setup -> Apex Classes -> Schedule to fire at off-hours when org load is low.

Monitoring. Implement notifications on success/failure. Log job runs to a custom Job_Log__c object.

Test thoroughly. Test classes can simulate one chunk. For full-scale testing, test in a Full Sandbox with realistic data volume.

Trade-offs / alternatives:

Bulk API from outside — for one-time loads, consider running from outside Salesforce via Bulk API 2.0. Same processing, no Apex governor exposure. Good for migrations.
CDC + external processing — for ongoing streams, Change Data Capture + a Mulesoft/Snowflake processor scales infinitely.
Big Object archiving — if you're processing to delete old data, archive to Big Object first.

For 10M records as a one-off, Batch Apex with proper monitoring is the right answer.

How would you architect a long-running Apex job processing 10M records?

Why this answer works

Follow-ups to expect

Related dictionary terms