How do you architect a data quality strategy for Salesforce?

Bad data corrupts every downstream feature. Data quality strategy spans inputs, ongoing maintenance, and measurement.

1. Prevention (best leverage):

Validation rules at every save — required fields, format, cross-field constraints.
Picklists, not free-text — constrain value sets. Use Picklist Value Sets globally.
Lookup relationships, not text — foreign keys, not name strings.
Field-level help — guide users on what to enter.
Required at API level — no "optional via the API" loopholes for integrations.

2. Duplicate prevention:

Matching Rules + Duplicate Rules on Lead, Contact, Account.
Merge UI for users to consolidate dupes.
External Id fields — stable keys that integrations use to upsert without creating dupes.

3. Enrichment:

Third-party data — Clearbit, ZoomInfo, D&B Hoovers append firmographic data.
Pardot/MCAE auto-enriches Lead data.
Geocoding — address standardisation.

4. Cleansing (one-time):

Profile current data — assess completeness, accuracy, consistency.
Identify duplicates — surface and merge.
Standardise formats — country codes, phone formats, capitalisation.
Fill gaps — required fields with sensible defaults or via enrichment.
Archive stale records — old, inactive Leads/Contacts to a Big Object or external archive.

5. Ongoing maintenance:

Daily duplicate checks — automated reports flagging recent dupes.
Stale data alerts — Accounts not updated in 12 months.
Validation rule audit — periodically review which rules fire most (often signals where data is bad).
Field utilisation audit — fields never populated (consider deprecating).
Data Quality dashboard — completeness, dupes, age, key field percentages.

6. Tooling:

7. Governance:

Common pitfalls:

Treating data quality as one-time — degrades immediately without ongoing process.
No metrics — "we have a data quality problem" is qualitative; track quantitatively.
Cleansing without prevention — cleaning the same dupes monthly without fixing the source.

Senior consultants build the data quality flywheel into the implementation, not bolt it on later.

Why this answer works