How do you architect Salesforce for billions of records?

Billions of records is firmly in the territory of Big Objects, archive strategies, and data tiering.

Reality check:

Standard Custom Objects: practical limit ~hundreds of millions.
Big Objects: designed for billions. Index-driven access only.
External warehouses (Snowflake / BigQuery): for analytics on billions.
Salesforce Connect / External Objects: read-only views from external storage.

Strategy:

1. Data tiering.

Hot tier — recent/active records in standard objects. Indexed, queryable normally.
Warm tier — older but still occasionally accessed in Big Objects.
Cold tier — historical/archived in external warehouse.

Migration between tiers via scheduled jobs (Batch Apex moving from standard to Big Object; Data Pipeline moving Big Object to external).

2. Read-path optimisation.

For hot tier:

Custom Indexes on filter-heavy fields.
Selective queries mandatory.
Skinny Tables for frequently-accessed standard objects.

For Big Objects:

Async SOQL for analytical queries.
Index-based queries only — no ad-hoc.

For external:

External objects via Salesforce Connect for visibility.
Direct queries against the warehouse from BI tools.

3. Write-path.

Bulk API 2.0 for inbound.
Batch Apex for in-Salesforce processing.
Pub/Sub API for outbound replication.

4. Sharing model.

Private OWD on billions of records = expensive. Strategies:

Public Read if business allows.
Apex Managed Sharing with surgical RowCause.
Defer Sharing Calculations during bulk operations.
Skip ownership tracking on archive tier.

5. Reporting and analytics.

Standard reports won't work on billions.
CRM Analytics for in-Salesforce analytics on aggregated data.
External BI (Tableau, Power BI) querying warehouse for full-scale analytics.
Reporting Snapshots for historical aggregates.

6. Data lifecycle.

Retention policies — archive after N years.
Right-to-be-forgotten for GDPR / privacy.
Audit retention separate from operational.

7. Operational considerations.

Sandbox refresh time — Full sandboxes with billions of records refresh slowly.
Backup strategy — Salesforce native plus external backup tooling.
Disaster recovery — billions don't recover quickly.

Real architectures for billion-row:

High-volume customer interactions (telecom, retail, IoT) — typically tier with Big Objects + external warehouse.
Audit logs — extreme volume; usually archived to external storage.
IoT sensor data — Salesforce-as-system-of-engagement; raw data in warehouse.

Senior architect insight: billions of records on Salesforce is rarely the right answer. Salesforce is optimised for CRM-shape data, not log-shape data. The right answer is usually:

Salesforce holds the active operational view.
External warehouse holds historical / log / analytics data.
Integration sits between.

Resist the urge to keep everything in Salesforce. The platform's strengths are operational; bulk historical data lives elsewhere.

When you genuinely have billions of records that must be in Salesforce: Big Objects + careful design + Salesforce Support involvement (custom indexes, skinny tables, possibly database tuning).

How do you architect Salesforce for billions of records?

Why this answer works

Follow-ups to expect

Related dictionary terms