Integration

Failed to load batch — InvalidBatch: invalid CSV header / unrecognized field

Your Bulk API job's CSV has a column the target object doesn't have, or the header row is malformed. The job's `failedResults` lists the bad rows. Fix the column names to match Salesforce field API names exactly, including case and `__c` suffix.

Also seen asInvalidBatch·invalid CSV header·unrecognized field bulk api·Bulk API invalid field

You exported a million Contact records from your legacy CRM, transformed them through a Python script, and pushed the resulting CSV into Salesforce via Bulk API 2.0. Three batches in, the job returns Failed to load batch: InvalidBatch: invalid CSV header / unrecognized field. The data looks fine in a spreadsheet editor. Nothing in the file is obviously broken. You have a deadline in three hours and a CSV with eight hundred thousand rows that the platform refuses to read.

What the Bulk API is actually checking

The Bulk API parses CSV header rows by name against the SObject's field definitions. Every column header in your file must match a field API name on the target object or be a recognized special column like a polymorphic reference or a relationship lookup. Headers that don't match cause the entire batch to fail before any record is processed.

The platform doesn't do fuzzy matching. Account_Name is not the same as AccountName. Email__c is not the same as Email. Case usually matters, even when the org's standard fields seem case-insensitive. Trailing whitespace in a header name breaks the match.

A second category of error fires when the CSV format itself is malformed: a quoted field with an unescaped quote inside, a row with the wrong column count, a BOM at the start of the file the parser doesn't strip. These present as InvalidBatch too, with messages mentioning a malformed row.

The broken example

A CSV exported from a tool that used display labels instead of API names:

First Name,Last Name,Email,Account Name,Owner
Alice,Lee,alice@acme.com,Acme Corp,jane.smith@acme.com
Bob,Khan,bob@beta.io,Beta Industries,jane.smith@acme.com
Carol,Wei,carol@gamma.net,Gamma Holdings,jane.smith@acme.com

The headers read like fields a salesperson would recognize. None of them are field API names. The Contact object has FirstName, LastName, Email, AccountId, OwnerId. The Bulk API doesn't know what to do with First Name.

A second shape, an export from a transformation script that introduced an artifact:

FirstName,LastName,Email,Phone
Alice,Lee,alice@acme.com,415-555-0101
Bob,Khan,bob@beta.io,415-555-0102

The file starts with the byte-order mark for UTF-8 (the invisible character at position zero). The first header reads as a field named with an invisible prefix character followed by FirstName. The platform doesn't match that to anything on Contact.

A third shape:

FirstName,LastName,Email,Account.Industry
Alice,Lee,alice@acme.com,Manufacturing
Bob,Khan,bob@beta.io,Healthcare

The author thought you could write a parent field on Contact during an insert by referencing Account.Industry. Bulk API ingest does not auto-update parent records. It accepts certain relationship lookups for upsert by external id (like Account.External_Id__c) but not arbitrary parent field writes.

Three paths to a fix

The three fixes ranked by frequency:

Replace display labels with field API names. Open your target object in Setup, click each field, and copy the "Field Name" value. Standard fields use names like FirstName, LastName, Email, AccountId. Custom fields end in __c: Industry_Vertical__c, Renewal_Date__c. Rewrite your CSV header row using the API names. Re-export from your spreadsheet tool if needed; do not edit by hand on a million-row file or you'll introduce other bugs.

Strip the byte-order mark. If your file was created by Excel or a script that wrote UTF-8 with a BOM, the first three bytes of the file are an invisible marker. Many CSV parsers silently consume the BOM but the Bulk API does not. In Python: open(filename, 'r', encoding='utf-8-sig') reads with the BOM stripped. In Node: read the file, then strip the leading BOM character with a regex. On a Unix shell: use a sed one-liner that removes the three BOM bytes from the first line.

Use external-id syntax for relationship lookups. If you need to set a Contact's Account via a value from your source system, mark the Account's external id field as an external id (in Setup), then write the CSV column header as Account.External_Id__c (where External_Id__c is the API name of that external id field). The Bulk API resolves the value to an AccountId at write time. You cannot write arbitrary parent fields like Account.Industry during a Contact insert.

The fixed example

The cleaned-up CSV that the Bulk API accepts:

FirstName,LastName,Email,Phone,AccountId,OwnerId
Alice,Lee,alice@acme.com,415-555-0101,001xx000003DGb1,005xx000001Sv8z
Bob,Khan,bob@beta.io,415-555-0102,001xx000003DGb2,005xx000001Sv8z
Carol,Wei,carol@gamma.net,415-555-0103,001xx000003DGb3,005xx000001Sv8z

Headers are API names. AccountId values are the 15-character or 18-character Salesforce ids of existing Account records. OwnerId is a user id. The BOM was stripped at file write time.

If you don't have AccountIds but you have an external id on Account, the equivalent upsert-style CSV looks like:

FirstName,LastName,Email,Phone,Account.Legacy_Account_Id__c
Alice,Lee,alice@acme.com,415-555-0101,LEG-00001
Bob,Khan,bob@beta.io,415-555-0102,LEG-00002
Carol,Wei,carol@gamma.net,415-555-0103,LEG-00003

The Bulk API resolves each Legacy_Account_Id__c value to the corresponding AccountId at write time. This is the right pattern for migration jobs from external systems.

When the headers look right but it still fails

Three less-common causes worth knowing about.

Field-level security denies access. Even if your user has API access and the field exists, profile-level FLS can hide a field from the user running the job. The Bulk API sees the field as not available to the user and reports it as unrecognized. The fix is to assign a permission set granting FLS read and write on the field to the integration user.

Field is on a different record type or page layout. Bulk API doesn't care about layouts, but if your CSV references a field that only exists on a record type your user can't access, you may see unrecognized field in some narrow cases. Check that the integration user can see the record type.

Field is on a managed package object the user can't access. Fields with namespaces (fnci__Renewal_Date__c) require the namespace prefix in the CSV header. Strip the prefix and you'll see unrecognized field.

The job lifecycle and where the error fires

Bulk API 2.0 jobs go through stages: Open, UploadComplete, InProgress, JobComplete, Aborted, Failed. The InvalidBatch error appears on individual batches within a job after it transitions to InProgress.

Bulk API 1.0 has a similar flow with explicit batch creation. The batch endpoint returns the error on POST of the batch payload.

The error doesn't roll back successfully-processed batches. If batches 1, 2, and 3 succeeded and batch 4 fails with InvalidBatch, the records in 1-3 are already in the org. You don't get a do-over for free.

When triaging:

Pull the failed batch's response payload. It contains the parsed header row and the specific field name that failed.
Pull a sample row from the input CSV at the same position.
Compare character by character. Look for whitespace, BOM, or punctuation that looks identical visually but isn't.

A common gotcha: an em dash in a CSV column header where you meant a hyphen. The visual difference is a few pixels. The byte-level difference is large.

Validating CSVs before sending them

A pre-flight check catches most InvalidBatch errors before they hit Salesforce.

import csv

def validate_csv(filename, expected_fields):
    with open(filename, 'r', encoding='utf-8-sig') as f:
        reader = csv.reader(f)
        headers = next(reader)
        unknown = [h for h in headers if h not in expected_fields]
        if unknown:
            raise ValueError(f'Unknown headers: {unknown}')
        row_count = sum(1 for _ in reader)
    return row_count

expected = {
    'FirstName', 'LastName', 'Email', 'Phone', 'AccountId', 'OwnerId'
}
n = validate_csv('contacts.csv', expected)
print(f'Validated {n} rows.')

Run this against the file your transformation script produces, before invoking the Bulk API. If it fails locally, you fix the transformation and don't waste a Bulk API job.

You can fetch the legitimate field API names with a one-time describe call:

from simple_salesforce import Salesforce
sf = Salesforce(instance_url='...', session_id='...')
contact_meta = sf.Contact.describe()
api_names = {f['name'] for f in contact_meta['fields']}

Store the set, use it as expected_fields in your validator.

Format-specific gotchas

Comma in a value. The value must be quoted: "123 Main St, Suite 200". The CSV writer should handle this automatically; problems arise when the file was hand-assembled.

Quote in a value. The quote must be escaped by doubling it inside a quoted field: "She said ""hello"" at the meeting". Single backslash escapes are not supported.

Newline in a value. Quoted fields can contain newlines. Some parsers don't expect this. If your source data has multi-line text in a Description field, double-check that your CSV writer quoted those values.

Date format. Salesforce expects ISO 8601: 2026-05-24 for dates, 2026-05-24T15:00:00.000Z for datetimes. Spreadsheet tools love to reformat dates into US or UK locale-specific shapes that Salesforce rejects with a different but related error.

When the file is too large

Bulk API 2.0 accepts up to 150MB per job upload (compressed) or 100MB uncompressed. Larger files need to be split. The recommended pattern is one file per "logical chunk" of work (say, 100,000 to 250,000 rows each) submitted as separate jobs.

For very large data loads, consider Salesforce's PK Chunking feature, which automatically splits the source data by primary key ranges. Or use Data Loader's "Bulk" mode, which abstracts the chunking for you.

Test patterns

Two tests for any CSV-to-Bulk-API pipeline:

A header-validation test that feeds in known-good and known-bad CSV files and confirms the validator accepts the first and rejects the second. Include cases for BOM, trailing whitespace, label-vs-API-name, and missing required fields.

A round-trip test that writes a small CSV, runs it through the Bulk API on a sandbox, and confirms the records appear in the org with the expected values. Run this on every change to the transformation logic.

Related errors

The CSV family overlaps with other Bulk API errors. Knowing which is which speeds triage:

InvalidBatch: invalid CSV header / unrecognized field (the topic of this page)
InvalidBatch: Field is not writeable (the field exists but FLS or layout blocks the write)
Row 47: REQUIRED_FIELD_MISSING (record-level; the row passed parse but failed validation)
Row 312: DUPLICATE_VALUE (a unique-constraint violation at the field level)
JobAlreadyFinalized (you tried to upload to a job that already closed)

Habits that prevent CSV pain

Generate CSVs programmatically, not by hand. A function that writes the file with quoted fields, escaped quotes, and a known encoding eliminates entire categories of bugs.

Always use the field API names, not labels. Document this loudly in your team's data-load runbook.

Always validate before sending. The Bulk API costs you a job-attempt every time it rejects. Local validation costs you nothing.

Write a sample file at the smallest possible scale, send it manually, and confirm acceptance before running a multi-million row job.

Related dictionary terms

Share this fix

Share on LinkedIn Share on X