Cross-system analytics requires unified data, beyond what any single system can deliver.
Architectural pattern: data warehouse + BI layer.
1. Data ingestion.
- Salesforce — CDC + Mulesoft / native Snowflake connector / Heroku Connect.
- ERP — vendor-specific (SAP HANA replication, Oracle GoldenGate, etc.) or custom ETL.
- Marketing Cloud — APIs + ETL.
- Web analytics — Google Analytics export, Mixpanel API, Segment.
- Other — bespoke connectors as needed.
All flow into a central warehouse.
2. Data warehouse.
- Snowflake — most common modern choice.
- BigQuery — Google ecosystem.
- Redshift — AWS.
- Databricks — for data science workloads.
Warehouse holds ALL the data; processed for analytics.
3. Transformation.
- dbt (data build tool) — SQL-based transformation; declarative.
- Custom ETL — Python / Node / Spark.
- Mulesoft / iPaaS — for orchestration.
Transformation: clean, normalise, join across sources, build summary tables.
4. Storage tiers.
- Raw layer — exactly as ingested.
- Cleansed layer — standardised, deduped.
- Conformed layer — joined across sources.
- Mart layer — domain-specific (sales mart, marketing mart).
5. BI layer.
- Tableau — visualisation, exploration.
- Power BI — Microsoft ecosystem.
- Looker / Looker Studio — Google ecosystem.
- CRM Analytics — Salesforce-native (works on warehouse data).
- Custom dashboards in Salesforce or Heroku.
6. Governance.
- Data dictionary — every metric defined consistently.
- Lineage tracking — where does this metric come from?
- Access control — who sees what data.
- Quality monitoring — alerts on data drift.
7. Salesforce integration with analytics.
- CRM Analytics — pull warehouse data into Salesforce dashboards.
- External Objects — read warehouse from Salesforce records.
- Embedded dashboards in Lightning pages.
Architectural decisions:
1. Real-time vs batch:
- Most analytics: hourly / daily batch suffices.
- Real-time only when business decisions depend (fraud detection, inventory).
- Real-time costs more; justify the value.
2. Warehouse vs lake:
- Warehouse — structured, optimized for queries.
- Lake — raw data, schema-on-read.
- Modern hybrid: lakehouse (Snowflake, Databricks).
3. Salesforce as source vs sink:
- Most often Salesforce is source — data flows out for analytics.
- Sometimes sink — analytics insights flow back to enrich Salesforce records.
4. Reverse ETL.
- After analytics, push insights back to operational systems.
- "Customer health score" computed in warehouse, written back to Salesforce.
- Tools: Hightouch, Census.
5. Master data management.
- Customer master across systems requires deduplication.
- MDM tool (Informatica, Reltio) or warehouse-based mastering.
Common pitfalls:
- Salesforce reports trying to do warehouse work — slow, limited.
- No data ownership — quality drifts.
- Silos persist — data in warehouse but reports built per system.
- Stale data — sync intervals too long.
- Cost surprises — warehouse compute / storage adds up.
Senior architect insight: analytics is its own architecture domain. Don't conflate with operational systems. Both have their place; design each deliberately.
Salesforce Reports & Dashboards work for in-Salesforce analytics. Cross-system analytics needs a warehouse + BI layer. Don't try to make Salesforce do both.
The most senior framing: operational systems serve transactions; analytics systems serve decisions. Different shapes; different optimizations.
