Batch, Bulk API
Bulk API is the Salesforce REST-based interface built for moving large volumes of records in and out of an org.
Definition
Bulk API is the Salesforce REST-based interface built for moving large volumes of records in and out of an org. It handles insert, update, upsert, delete, hard delete, and query operations against datasets that run from a few thousand rows up to hundreds of millions. The work happens asynchronously. A client submits a job, Salesforce processes the records in the background, and the client polls for the results rather than waiting on a single synchronous response.
Bulk API 2.0 is the modern version and the one Salesforce recommends for new work. The original Bulk API (often called 1.0) made the client split data into batches by hand. Bulk API 2.0 removes that step. You upload one CSV file, Salesforce decides how to batch it internally, and you retrieve successful and failed rows when the job finishes. Salesforce guidance puts any operation above roughly 2,000 records in Bulk API territory.
How Bulk API moves data at scale
Why a separate API exists for high volume
The standard REST and SOAP APIs are synchronous. You send a request, the platform processes it inside one transaction, and you get an answer back on the same call. That model is fine for a handful of records. It falls apart when you need to load a million Accounts, because each call counts against the daily API request allocation and the round trips add up fast. Bulk API exists to sidestep that ceiling. It accepts a large payload in one upload, then processes the rows on the asynchronous layer where the platform can parallelize the work across server resources. The result is throughput that synchronous calls cannot match. A job that would take hours of chained REST calls can finish in minutes. This is why almost every serious data tool, including Data Loader, MuleSoft, and third-party ETL platforms, switches to Bulk API once the row count climbs. Salesforce documentation frames the decision simply. Operations above a few thousand records are good candidates for Bulk API, and smaller real-time operations belong on the synchronous REST or SOAP interfaces instead.
Bulk API 2.0 versus the original Bulk API
The original Bulk API put batch management on the client. You opened a job, split your data into batches of up to 10,000 records each, added every batch yourself, closed the job, then tracked each batch to completion. It worked, but it was fiddly, and getting the batching wrong hurt performance. Bulk API 2.0, introduced in 2017, took that burden away. The Salesforce blog that announced it was titled around slimming down the workflow, and that is exactly what happened. You create a job, upload one CSV payload, mark the upload complete, and Salesforce figures out the most efficient way to batch the data behind the scenes. The nine-ish steps of the old flow collapsed to about seven, and the client no longer reasons about batch sizes at all. Both versions still ship and both are documented in the same developer guide. For new integrations, 2.0 is the default recommendation. The original API remains useful mainly for older clients and a few edge cases that depend on manual batch control.
The ingest job lifecycle
A Bulk API 2.0 ingest job moves through a defined set of states, and the client drives it with plain REST calls. You start by creating the job with a POST that names the object and the operation, such as insert or upsert. The job opens in the Open state and returns a job ID. Next you upload the record data as a CSV, where the first row holds the field API names and each later row is one record. When the upload finishes, you PATCH the job to UploadComplete, which tells Salesforce the data is ready. The platform then moves the job to InProgress and processes the rows. The client polls the job info endpoint until the state reads JobComplete or Failed. At that point you call two separate endpoints to download the successful records and the failed records. If you abort a job, its state changes to Aborted, the job is never queued, and any data already uploaded for it is discarded. Each phase has its own endpoint, which keeps the flow predictable.
Supported operations and the query path
Bulk API covers the full set of data operations an integration needs. Insert creates new records. Update changes existing ones by ID. Upsert matches on an external ID field and either updates a match or inserts a new record, which makes it the safest choice for repeatable loads. Delete sends records to the Recycle Bin, while hard delete removes them permanently and skips the bin entirely. Query is the read side. A bulk query runs a SOQL statement and returns the matching rows as a CSV result that the client downloads in pages. The query path matters for reporting and migration jobs that need to pull millions of rows out of Salesforce without tripping synchronous limits. For ingest, upsert against a stable external ID is the pattern that protects you from duplicates when a job is retried. Insert plus update logic built by hand is far harder to make idempotent, and at high volume that fragility shows up as duplicate records.
Limits, allocations, and the 24-hour window
Bulk API has its own allocation that is tracked separately from the per-record cost of synchronous calls, though both ultimately roll up under the org daily API picture. For Bulk API 2.0 ingest, Salesforce caps the number of records you can process in a rolling 24-hour period, and that ceiling sits in the range of roughly 100 to 150 million records depending on edition and the exact metric. A single upload payload is limited by size, with the CSV capped around 150 MB per job. The original Bulk API expressed its limits differently, in terms of batches per day and records per batch, which is one more reason the simpler 2.0 model is easier to plan against. Because the window rolls rather than resetting at midnight, a heavy overnight load can still throttle a morning job. Monitor consumption before scheduling large runs. The Bulk API usage shows up in the org API usage views, and watching it keeps a big migration from stalling halfway through.
PK Chunking for very large queries
Querying an object that holds hundreds of millions of rows can overwhelm the query engine if it tries to scan everything at once. PK Chunking is the answer. It splits a bulk query into smaller pieces based on the record ID, the primary key, so each chunk covers a bounded range of IDs and processes independently. In the original Bulk API you turn this on with a request header on the query job and tune the chunk size. In Bulk API 2.0, the platform applies chunking automatically for queries, so you generally do not manage it by hand. The payoff is reliability. Without chunking, a query across a very large table can fail with timeout or memory errors that are hard to diagnose. With it, the same query completes as a series of manageable chunks. Large data volume objects, including ones backed by Big Objects or tables with heavy history, are the usual reason you reach for this. It is a throughput and stability tool more than a feature you interact with directly.
Error handling, retries, and idempotency
Bulk API returns a per-row outcome, not a single pass or fail for the whole job. When a job completes, you download two result sets: one listing the records that succeeded and one listing the records that failed, each failed row carrying an error message. Common failures are validation rule errors, duplicate rule matches, missing required fields, and bad lookups. The right pattern is to read the failed-row file and resubmit only those rows, after fixing whatever caused the error. Resubmitting the entire job wastes your daily allocation and risks creating duplicates. This is where upsert against an external ID earns its keep, because rerunning the same payload updates the existing matches instead of inserting copies. A newer addition to Bulk API 2.0 brought job events and partial result downloads, which let a client react to state changes and pull results sooner rather than waiting for the full job. Building retry logic around the failed-row file, with upsert as the operation, is the difference between an integration that heals itself and one that needs a human every time a rule trips.
How to run a Bulk API 2.0 ingest job
Here is the shape of a Bulk API 2.0 ingest job using the REST endpoints. The flow is the same whether you call it from code or watch a tool like Data Loader do it for you.
- Create the job
Send a POST to the ingest jobs endpoint with the target object and the operation (insert, update, upsert, delete, or hardDelete). For upsert, name the external ID field. The response returns a job ID and an Open state.
- Upload the CSV data
PUT your record data to the job's batches endpoint as CSV. The header row lists field API names; each later row is one record. Keep the payload within the size limit, around 150 MB per job.
- Close the upload
PATCH the job state to UploadComplete. This signals Salesforce that the data is ready and lets the platform move the job to InProgress and start batching it internally.
- Poll for completion
Call the job info endpoint on an interval until the state reads JobComplete or Failed. Large jobs take minutes; do not poll aggressively, since each call is still an API request.
- Download the results
Retrieve the successful records and the failed records from their separate result endpoints. Parse the failed-row file, fix the errors, and resubmit only those rows.
The API name of the target sObject for the job, for example Account or a custom object like Invoice__c.
One of insert, update, upsert, delete, or hardDelete. This sets what the job does with each row in the CSV.
Required only for upsert. The external ID field used to match incoming rows to existing records so reruns update instead of duplicate.
The format of the uploaded data. For Bulk API 2.0 ingest this is CSV, and the upload itself is sent as text/csv.
- Forgetting to PATCH the job to UploadComplete leaves it in Open forever; the platform never starts processing until you close the upload.
- The CSV header must use field API names, not labels. A label like Account Name instead of Name silently maps nothing and the row fails.
- The daily record allocation runs on a rolling 24-hour window, so a large overnight load can throttle a job you start the next morning.
- Resubmitting a whole job after partial failure burns your allocation and can create duplicates; resubmit only the failed rows, ideally via upsert.
Trust & references
Cross-checked against the following references.
Straight from the source - Salesforce's reference material on Batch, Bulk API.
Hands-on resources to go deeper on Batch, Bulk API.
About the Author
Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.
Test your knowledge
Q1. What is a Batch in the context of the Bulk API?
Q2. What is the maximum number of records per Batch in Bulk API 1.0?
Q3. What is the main reason to use the Bulk API instead of REST or SOAP?
Discussion
Loading discussion…