How do you architect Salesforce for high availability?

High availability (HA) = system stays up despite failures. Salesforce as a SaaS provides baseline HA; architects layer on top.

Salesforce-provided HA:

Multi-instance deployment — Salesforce runs in geographically distributed data centers.
99.95% uptime SLA — measured monthly.
Automatic failover within data center.
Salesforce Trust dashboard for status.
Disaster Recovery (DR) with Recovery Time Objective (RTO) ~12 hours.

You don't manage the infrastructure; Salesforce does.

Where architects influence HA:

1. Avoid single points of failure (SPOFs) in custom code.

Don't depend on one specific user for scheduled jobs (deactivation kills jobs).
Don't depend on one external system without fallback.
Don't depend on one Connected App without backup.

2. Resilient integrations.

Retries, dead-letter queues, circuit breakers.
Asynchronous patterns where possible.
Idempotent operations.

3. Capacity planning.

API call limits not exhausted by single integration.
Storage limits monitored, expanded before crisis.
Sandbox capacity available for emergency dev.

4. Disaster recovery.

Backup strategy — Salesforce native + external backup tools.
Restore testing — periodic drills.
Cross-region for critical data — Data Cloud or external warehouse.

5. Multi-org HA.

For organizations needing higher than Salesforce's standard HA:

Multiple Salesforce orgs in different regions.
Active-active — both orgs serving traffic; sync between.
Active-passive — one primary, one standby.
Federated identity — users access either.

This is unusual; complexity is high. Reserved for absolutely-critical use cases.

6. Application-level resilience.

Graceful degradation when external systems fail.
Cached fallbacks for read paths.
User communication during outages.

7. Monitoring.

Salesforce Trust for platform status.
Synthetic monitoring for key user journeys.
Real-user monitoring for actual experience.
Alerting on degradation.

8. Incident response.

Runbooks for common scenarios.
Communication plan for users / customers.
Postmortems and improvements.

Architectural patterns:

Read replicas via external warehouse — analytics never affected by Salesforce outage.
Async writes via queue — user actions absorbed during slowness.
Health checks — components self-report; failed components excluded from routing.

Common pitfalls:

Assuming Salesforce is always up — design for the rare outage.
Single Connected App for all integrations — one revocation kills everything.
No backup strategy — Salesforce's native may not meet your RTO/RPO.
No DR drill — backup that's never tested isn't a backup.

Senior architect insight: HA is a spectrum. 99.95% Salesforce SLA is plenty for most. For mission-critical (banking, life-safety), additional layers (multi-org, external systems) may be justified. Match HA investment to actual need.

Don't over-engineer HA; the complexity itself becomes a failure mode.

How do you architect Salesforce for high availability?

Why this answer works

Follow-ups to expect

Related dictionary terms