Billions of records is firmly in the territory of Big Objects, archive strategies, and data tiering.
Reality check:
- Standard Custom Objects: practical limit ~hundreds of millions.
- Big Objects: designed for billions. Index-driven access only.
- External warehouses (Snowflake / BigQuery): for analytics on billions.
- Salesforce Connect / External Objects: read-only views from external storage.
Strategy:
1. Data tiering.
- Hot tier — recent/active records in standard objects. Indexed, queryable normally.
- Warm tier — older but still occasionally accessed in Big Objects.
- Cold tier — historical/archived in external warehouse.
Migration between tiers via scheduled jobs (Batch Apex moving from standard to Big Object; Data Pipeline moving Big Object to external).
2. Read-path optimisation.
For hot tier:
- Custom Indexes on filter-heavy fields.
- Selective queries mandatory.
- Skinny Tables for frequently-accessed standard objects.
For Big Objects:
- Async SOQL for analytical queries.
- Index-based queries only — no ad-hoc.
For external:
- External objects via Salesforce Connect for visibility.
- Direct queries against the warehouse from BI tools.
3. Write-path.
- Bulk API 2.0 for inbound.
- Batch Apex for in-Salesforce processing.
- Pub/Sub API for outbound replication.
4. Sharing model.
Private OWD on billions of records = expensive. Strategies:
- Public Read if business allows.
- Apex Managed Sharing with surgical RowCause.
- Defer Sharing Calculations during bulk operations.
- Skip ownership tracking on archive tier.
5. Reporting and analytics.
- Standard reports won't work on billions.
- CRM Analytics for in-Salesforce analytics on aggregated data.
- External BI (Tableau, Power BI) querying warehouse for full-scale analytics.
- Reporting Snapshots for historical aggregates.
6. Data lifecycle.
- Retention policies — archive after N years.
- Right-to-be-forgotten for GDPR / privacy.
- Audit retention separate from operational.
7. Operational considerations.
- Sandbox refresh time — Full sandboxes with billions of records refresh slowly.
- Backup strategy — Salesforce native plus external backup tooling.
- Disaster recovery — billions don't recover quickly.
Real architectures for billion-row:
- High-volume customer interactions (telecom, retail, IoT) — typically tier with Big Objects + external warehouse.
- Audit logs — extreme volume; usually archived to external storage.
- IoT sensor data — Salesforce-as-system-of-engagement; raw data in warehouse.
Senior architect insight: billions of records on Salesforce is rarely the right answer. Salesforce is optimised for CRM-shape data, not log-shape data. The right answer is usually:
- Salesforce holds the active operational view.
- External warehouse holds historical / log / analytics data.
- Integration sits between.
Resist the urge to keep everything in Salesforce. The platform's strengths are operational; bulk historical data lives elsewhere.
When you genuinely have billions of records that must be in Salesforce: Big Objects + careful design + Salesforce Support involvement (custom indexes, skinny tables, possibly database tuning).
