Data migration architecture for legacy CRM -> Salesforce, several million records, 50+ source tables.
Phase 1: Discovery (2-4 weeks):
- Source inventory — every table, row count, field count, data quality assessment.
- Profile data — completeness, distribution, anomalies, duplicates.
- Dependencies — parent-child relationships, circular references.
- Target Salesforce model — designed in business Discovery.
Phase 2: Mapping (2-4 weeks):
- Field-level mapping spreadsheet: source.field -> target.field.
- Transformation rules — formatting, defaults, lookups, conditional logic.
- Picklist value mappings — source values -> Salesforce values.
- Owner mapping — source users -> Salesforce users.
- External Id strategy — for stable referencing.
Phase 3: Tooling (1-2 weeks):
- Bulk API 2.0 for inbound data.
- Mulesoft / Talend / custom Python for transformations.
- Salesforce Data Loader for smaller migrations.
- Custom Apex for complex transformations.
Phase 4: Iterative migration testing:
Pass 1: Tiny sample (10-100 rows). Validate end-to-end, mapping, error handling.
Pass 2: Subset (1-10% of volume). Performance check; identify scaling issues.
Pass 3: Full volume in sandbox. Time the run; rehearse.
Phase 5: Cleansing (ongoing):
- Address quality issues identified.
- Deduplicate.
- Standardise formats.
- Fill required fields.
- Anonymise where required.
Phase 6: Order of operations:
Migrate parents before children:
- Users.
- Reference data (Products, etc.).
- Accounts.
- Contacts.
- Opportunities, Cases.
- Activities, custom objects, files.
Phase 7: Cutover plan:
- Freeze source before final extract.
- Final extract at scheduled time.
- Transform in middleware.
- Load to Salesforce production.
- Validate — row counts, sample records, key reports.
- Reconciliation reports — source vs Salesforce totals.
Phase 8: Post-migration:
- Audit — what migrated, what failed, what was skipped.
- Defect log — failed records for manual fix.
- Communication — users informed.
- Rollback plan — if migration fails late.
Performance considerations:
- Defer Sharing Calculations during bulk loads.
- Disable validation rules / triggers for migration if needed.
- Bulk API sizing — typically 10,000 records per batch.
- Sandbox capacity — Full sandbox for realistic testing.
Common pitfalls:
- Underestimating cleansing — 30-50% of effort.
- Wrong order — children before parents fails.
- No reconciliation — silent data loss.
- No rollback plan — late failures catastrophic.
Senior architect insight: data migration is a project within a project. Treat it with rigor: dedicated team, dedicated timeline, dedicated tooling, dedicated reconciliation.
The senior framing: most Salesforce projects fail due to data migration issues, not configuration. Invest accordingly.
