How would you architect a strategy for Large Data Volumes (LDV)?

Large Data Volume (LDV) is the loose term for a Salesforce org with objects holding millions to hundreds of millions of records — at which point standard query, reporting, and sharing patterns break down.

Strategy elements:

Indexing:

Salesforce auto-indexes Id, Name, foreign keys, and certain system fields. Add Custom Indexes on filter-heavy fields via a support case (admins can request; Salesforce engineering provisions).
Selective queries — for queries against LDV objects, the WHERE clause must hit an indexed field. Non-selective queries time out.

Skinny Tables — for the most-accessed standard objects, Salesforce can provision a Skinny Table (a denormalised copy of high-frequency columns). This is a Salesforce-managed performance feature, requested via support.

Archiving:

Move old data to Big Objects for compliance retention, or to external warehousing (Snowflake/BigQuery via Data Cloud or middleware).
Schedule delete-and-purge of old data once archived.

Sharing model implications:

Private OWD on LDV objects + millions of rows = expensive sharing recalculation. Use Public R/W where data sensitivity allows.
Defer Sharing Calculations during bulk loads.
Consider Apex Managed Sharing with carefully designed RowCause for surgical sharing instead of broad sharing rules.

Reporting:

Standard reports degrade past about 1M rows on the source object. Use CRM Analytics for cross-million-row analytics.
Reporting Snapshots capture aggregated point-in-time stats to avoid re-querying live data.

Mass operations:

Use Bulk API for any data load over 5,000 rows.
Use batch Apex or Queueable with proper batch sizing for LDV processing in code.
Schedule heavy jobs for off-hours.

Data model:

Avoid roll-up summaries on LDV objects — they recalculate frequently and are slow.
Partition data by date or category if possible — date-based partitioning lets you query "last 90 days" with a date-indexed query.

Monitoring:

Setup -> System Overview shows current object row counts.
Event Monitoring captures slow query logs.
Alert when sharing recalc queues build up.

Common pitfalls:

Hitting the 50M+ row threshold on a single object where Lookup queries become unreliable.
Sharing recalc taking 8+ hours because of an unconsidered sharing rule change.
Reports that worked at 100K rows time out at 5M rows.

LDV strategy is more architect than admin, but admins are the front line of detecting when it's becoming relevant. Watch row counts; raise concerns before users feel slowdowns.

How would you architect a strategy for Large Data Volumes (LDV)?

Why this answer works

Follow-ups to expect

Related dictionary terms