What is the most important tip for working with Batch, Bulk API?

Quick fact: Bulk API 2.0 batches your data for you, so you upload one CSV and Salesforce decides how to split it internally.

DevelopmentAdvanced

Batch, Bulk API

Q: In Bulk API terminology, what is a Batch?

A Batch is a set of records submitted together inside a Bulk API Job. A Batchable Apex class, a cron entry, and a saved field mapping are unrelated concepts that share the word loosely.

Q: How does Bulk API 2.0 differ from Bulk API 1.0 in handling the payload?

Bulk API 2.0 accepts a single payload and chunks it server-side, unlike 1.0's client-side 10,000-record batches. 2.0 supports DML plus query and remains asynchronous, not synchronous.

Q: What is the main reason to choose Bulk API over the regular REST API?

Bulk API is built for high-volume async loads where REST would burn the daily quota. It is asynchronous not synchronous, both APIs enforce validation, and REST is the better fit for single-record reads.

Bulk API is the Salesforce REST-based interface built for moving large volumes of records in and out of an org.

Hear it

§ 01

Definition

Bulk API is the Salesforce REST-based interface built for moving large volumes of records in and out of an org. It handles insert, update, upsert, delete, hard delete, and query operations against datasets that run from a few thousand rows up to hundreds of millions. The work happens asynchronously. A client submits a job, Salesforce processes the records in the background, and the client polls for the results rather than waiting on a single synchronous response.

Bulk API 2.0 is the modern version and the one Salesforce recommends for new work. The original Bulk API (often called 1.0) made the client split data into batches by hand. Bulk API 2.0 removes that step. You upload one CSV file, Salesforce decides how to batch it internally, and you retrieve successful and failed rows when the job finishes. Salesforce guidance puts any operation above roughly 2,000 records in Bulk API territory.

§ 02

How Bulk API moves data at scale

Why a separate API exists for high volume

The standard REST and SOAP APIs are synchronous. You send a request, the platform processes it inside one transaction, and you get an answer back on the same call. That model is fine for a handful of records. It falls apart when you need to load a million Accounts, because each call counts against the daily API request allocation and the round trips add up fast. Bulk API exists to sidestep that ceiling. It accepts a large payload in one upload, then processes the rows on the asynchronous layer where the platform can parallelize the work across server resources. The result is throughput that synchronous calls cannot match. A job that would take hours of chained REST calls can finish in minutes. This is why almost every serious data tool, including Data Loader, MuleSoft, and third-party ETL platforms, switches to Bulk API once the row count climbs. Salesforce documentation frames the decision simply. Operations above a few thousand records are good candidates for Bulk API, and smaller real-time operations belong on the synchronous REST or SOAP interfaces instead.

Bulk API 2.0 versus the original Bulk API

The original Bulk API put batch management on the client. You opened a job, split your data into batches of up to 10,000 records each, added every batch yourself, closed the job, then tracked each batch to completion. It worked, but it was fiddly, and getting the batching wrong hurt performance. Bulk API 2.0, introduced in 2017, took that burden away. The Salesforce blog that announced it was titled around slimming down the workflow, and that is exactly what happened. You create a job, upload one CSV payload, mark the upload complete, and Salesforce figures out the most efficient way to batch the data behind the scenes. The nine-ish steps of the old flow collapsed to about seven, and the client no longer reasons about batch sizes at all. Both versions still ship and both are documented in the same developer guide. For new integrations, 2.0 is the default recommendation. The original API remains useful mainly for older clients and a few edge cases that depend on manual batch control.

The ingest job lifecycle

A Bulk API 2.0 ingest job moves through a defined set of states, and the client drives it with plain REST calls. You start by creating the job with a POST that names the object and the operation, such as insert or upsert. The job opens in the Open state and returns a job ID. Next you upload the record data as a CSV, where the first row holds the field API names and each later row is one record. When the upload finishes, you PATCH the job to UploadComplete, which tells Salesforce the data is ready. The platform then moves the job to InProgress and processes the rows. The client polls the job info endpoint until the state reads JobComplete or Failed. At that point you call two separate endpoints to download the successful records and the failed records. If you abort a job, its state changes to Aborted, the job is never queued, and any data already uploaded for it is discarded. Each phase has its own endpoint, which keeps the flow predictable.

Supported operations and the query path

Bulk API covers the full set of data operations an integration needs. Insert creates new records. Update changes existing ones by ID. Upsert matches on an external ID field and either updates a match or inserts a new record, which makes it the safest choice for repeatable loads. Delete sends records to the Recycle Bin, while hard delete removes them permanently and skips the bin entirely. Query is the read side. A bulk query runs a SOQL statement and returns the matching rows as a CSV result that the client downloads in pages. The query path matters for reporting and migration jobs that need to pull millions of rows out of Salesforce without tripping synchronous limits. For ingest, upsert against a stable external ID is the pattern that protects you from duplicates when a job is retried. Insert plus update logic built by hand is far harder to make idempotent, and at high volume that fragility shows up as duplicate records.

Limits, allocations, and the 24-hour window

Bulk API has its own allocation that is tracked separately from the per-record cost of synchronous calls, though both ultimately roll up under the org daily API picture. For Bulk API 2.0 ingest, Salesforce caps the number of records you can process in a rolling 24-hour period, and that ceiling sits in the range of roughly 100 to 150 million records depending on edition and the exact metric. A single upload payload is limited by size, with the CSV capped around 150 MB per job. The original Bulk API expressed its limits differently, in terms of batches per day and records per batch, which is one more reason the simpler 2.0 model is easier to plan against. Because the window rolls rather than resetting at midnight, a heavy overnight load can still throttle a morning job. Monitor consumption before scheduling large runs. The Bulk API usage shows up in the org API usage views, and watching it keeps a big migration from stalling halfway through.

PK Chunking for very large queries

Querying an object that holds hundreds of millions of rows can overwhelm the query engine if it tries to scan everything at once. PK Chunking is the answer. It splits a bulk query into smaller pieces based on the record ID, the primary key, so each chunk covers a bounded range of IDs and processes independently. In the original Bulk API you turn this on with a request header on the query job and tune the chunk size. In Bulk API 2.0, the platform applies chunking automatically for queries, so you generally do not manage it by hand. The payoff is reliability. Without chunking, a query across a very large table can fail with timeout or memory errors that are hard to diagnose. With it, the same query completes as a series of manageable chunks. Large data volume objects, including ones backed by Big Objects or tables with heavy history, are the usual reason you reach for this. It is a throughput and stability tool more than a feature you interact with directly.

Error handling, retries, and idempotency

Bulk API returns a per-row outcome, not a single pass or fail for the whole job. When a job completes, you download two result sets: one listing the records that succeeded and one listing the records that failed, each failed row carrying an error message. Common failures are validation rule errors, duplicate rule matches, missing required fields, and bad lookups. The right pattern is to read the failed-row file and resubmit only those rows, after fixing whatever caused the error. Resubmitting the entire job wastes your daily allocation and risks creating duplicates. This is where upsert against an external ID earns its keep, because rerunning the same payload updates the existing matches instead of inserting copies. A newer addition to Bulk API 2.0 brought job events and partial result downloads, which let a client react to state changes and pull results sooner rather than waiting for the full job. Building retry logic around the failed-row file, with upsert as the operation, is the difference between an integration that heals itself and one that needs a human every time a rule trips.

§ 03

How to run a Bulk API 2.0 ingest job

Here is the shape of a Bulk API 2.0 ingest job using the REST endpoints. The flow is the same whether you call it from code or watch a tool like Data Loader do it for you.

Create the job
Send a POST to the ingest jobs endpoint with the target object and the operation (insert, update, upsert, delete, or hardDelete). For upsert, name the external ID field. The response returns a job ID and an Open state.
Upload the CSV data
PUT your record data to the job's batches endpoint as CSV. The header row lists field API names; each later row is one record. Keep the payload within the size limit, around 150 MB per job.
Close the upload
PATCH the job state to UploadComplete. This signals Salesforce that the data is ready and lets the platform move the job to InProgress and start batching it internally.
Poll for completion
Call the job info endpoint on an interval until the state reads JobComplete or Failed. Large jobs take minutes; do not poll aggressively, since each call is still an API request.
Download the results
Retrieve the successful records and the failed records from their separate result endpoints. Parse the failed-row file, fix the errors, and resubmit only those rows.

Mandatory fields

objectrequired

The API name of the target sObject for the job, for example Account or a custom object like Invoice__c.

operationrequired

One of insert, update, upsert, delete, or hardDelete. This sets what the job does with each row in the CSV.

externalIdFieldNamerequired

Required only for upsert. The external ID field used to match incoming rows to existing records so reruns update instead of duplicate.

contentTyperequired

The format of the uploaded data. For Bulk API 2.0 ingest this is CSV, and the upload itself is sent as text/csv.

Gotchas

Forgetting to PATCH the job to UploadComplete leaves it in Open forever; the platform never starts processing until you close the upload.
The CSV header must use field API names, not labels. A label like Account Name instead of Name silently maps nothing and the row fails.
The daily record allocation runs on a rolling 24-hour window, so a large overnight load can throttle a job you start the next morning.
Resubmitting a whole job after partial failure burns your allocation and can create duplicates; resubmit only the failed rows, ideally via upsert.

Prefer this walkthrough as its own page? How to Batch, Bulk API in Salesforce, step by step

Trust & references

Sources

Cross-checked against the following references.

Bulk API 2.0 | Bulk API 2.0 and Bulk API Developer GuideSalesforce
Slim Down with the New Bulk API v2Salesforce

Official documentation

Straight from the source - Salesforce's reference material on Batch, Bulk API.

Bulk API 2.0 Ingest | Bulk API 2.0 and Bulk API Developer GuideSalesforce
Limits | Bulk API 2.0 and Bulk API Developer GuideSalesforce

Keep learning

Hands-on resources to go deeper on Batch, Bulk API.

Was this entry helpful?

Help us write better definitions. Quick reactions or detailed edit suggestions.

About the Author

Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.

Test your knowledge

Q1. In Bulk API terminology, what is a Batch?

Q2. How does Bulk API 2.0 differ from Bulk API 1.0 in handling the payload?

Q3. What is the main reason to choose Bulk API over the regular REST API?

Discussion

Loading…

Loading discussion…

Back to Dictionary