Salesforce has built-in observability that's adequate for small orgs and inadequate for enterprise.
Built-in:
- Debug Logs — Apex execution detail. Limited retention; user-specific.
- Login History — every login. 6-month retention.
- Setup Audit Trail — metadata changes. 6 months.
- Field History Tracking — record changes. 18-24 months.
- Apex Exception Email — failures emailed to admin.
Beyond built-in:
1. Custom error log object.
Error_Log__c capturing every uncaught exception, async failure, integration error. Persistent. Reportable.
apex public class Logger { public static void error(String className, String method, Exception e) { Error_Log__c log = new Error_Log__c( Class__c = className, Method__c = method, Message__c = e.getMessage(), Stack__c = e.getStackTraceString(), Timestamp__c = DateTime.now(), User__c = UserInfo.getUserId() ); insert log; } }
2. Event Monitoring (Shield).
- Detailed runtime events: API calls, logins, report exports, file downloads, slow queries.
- Hourly file delivery.
- Stream to S3 / SIEM.
3. External SIEM integration.
- Splunk / Datadog / Sumo Logic.
- Centralised observability across systems.
- Alerting on anomalies.
4. Application-level metrics.
- Custom counters via Custom Metadata or platform events.
- Track user journeys, feature usage, performance.
5. Integration health dashboards.
- Per integration: success rate, latency, error rate.
- Alert on threshold crossing.
6. Synthetic monitoring.
- Periodic test transactions exercising critical paths.
- Catch outages before users notice.
7. Real-user monitoring.
- Lightning page render time tracking.
- LWC component performance.
Alerting:
- Critical alerts -> pager / Slack / SMS.
- Important alerts -> email digest.
- Informational -> dashboard.
Don't alert on everything; people stop responding.
Architecture pattern:
Salesforce -> Event Monitoring stream -> Mulesoft/Kafka -> Splunk/Datadog -> Custom Error_Log__c -> dashboard/report -> Synthetic checks -> alerting
Pitfalls:
- No central observability — issues hide in different places.
- Alert fatigue — too many; ignored.
- No retention strategy — logs piling up; storage cost.
- Ignoring built-in — admins re-invent what's already there.
Architect role: define observability strategy from day one. Retrofitting is much harder than building in.
The senior insight: you can't fix what you can't see. Investment in observability pays back through faster incident resolution and avoided incidents.
