High availability (HA) = system stays up despite failures. Salesforce as a SaaS provides baseline HA; architects layer on top.
Salesforce-provided HA:
- Multi-instance deployment — Salesforce runs in geographically distributed data centers.
- 99.95% uptime SLA — measured monthly.
- Automatic failover within data center.
- Salesforce Trust dashboard for status.
- Disaster Recovery (DR) with Recovery Time Objective (RTO) ~12 hours.
You don't manage the infrastructure; Salesforce does.
Where architects influence HA:
1. Avoid single points of failure (SPOFs) in custom code.
- Don't depend on one specific user for scheduled jobs (deactivation kills jobs).
- Don't depend on one external system without fallback.
- Don't depend on one Connected App without backup.
2. Resilient integrations.
- Retries, dead-letter queues, circuit breakers.
- Asynchronous patterns where possible.
- Idempotent operations.
3. Capacity planning.
- API call limits not exhausted by single integration.
- Storage limits monitored, expanded before crisis.
- Sandbox capacity available for emergency dev.
4. Disaster recovery.
- Backup strategy — Salesforce native + external backup tools.
- Restore testing — periodic drills.
- Cross-region for critical data — Data Cloud or external warehouse.
5. Multi-org HA.
For organizations needing higher than Salesforce's standard HA:
- Multiple Salesforce orgs in different regions.
- Active-active — both orgs serving traffic; sync between.
- Active-passive — one primary, one standby.
- Federated identity — users access either.
This is unusual; complexity is high. Reserved for absolutely-critical use cases.
6. Application-level resilience.
- Graceful degradation when external systems fail.
- Cached fallbacks for read paths.
- User communication during outages.
7. Monitoring.
- Salesforce Trust for platform status.
- Synthetic monitoring for key user journeys.
- Real-user monitoring for actual experience.
- Alerting on degradation.
8. Incident response.
- Runbooks for common scenarios.
- Communication plan for users / customers.
- Postmortems and improvements.
Architectural patterns:
- Read replicas via external warehouse — analytics never affected by Salesforce outage.
- Async writes via queue — user actions absorbed during slowness.
- Health checks — components self-report; failed components excluded from routing.
Common pitfalls:
- Assuming Salesforce is always up — design for the rare outage.
- Single Connected App for all integrations — one revocation kills everything.
- No backup strategy — Salesforce's native may not meet your RTO/RPO.
- No DR drill — backup that's never tested isn't a backup.
Senior architect insight: HA is a spectrum. 99.95% Salesforce SLA is plenty for most. For mission-critical (banking, life-safety), additional layers (multi-org, external systems) may be justified. Match HA investment to actual need.
Don't over-engineer HA; the complexity itself becomes a failure mode.
