Agentforce / Einstein in production = AI agents reasoning, calling tools, interacting with Salesforce data, all reliably and at cost.
Architecture components:
1. Atlas Reasoning Engine — Salesforce's LLM platform. Available models, prompt structure, response parsing.
2. Einstein Trust Layer — sits between your prompts and the LLM:
- PII masking — replaces sensitive data in prompts; un-masks responses.
- Audit logging — every prompt and response logged.
- Toxicity filtering — blocks inappropriate content.
- Bias detection — flags problematic patterns.
- Mandatory; not bypassed.
3. Prompt Builder — reusable prompt templates with merge fields.
4. Apex integration — Apex calls AI via ConnectApi.GenerativeAi.generate or similar.
5. Custom tools / actions — Apex methods registered as agent tools. Agents call them to perform work.
6. Data Cloud — unified data feeding AI for grounding (RAG).
Production architecture decisions:
1. Cost management.
LLM calls cost. Per-call cost adds up at volume.
- Track per-feature usage.
- Set per-user quotas if needed.
- Cache when appropriate — repeat queries don't need re-inference.
- Use lower-cost models when possible (smaller models for simpler tasks).
2. Async invocation.
LLM calls are slow (seconds). Don't block users.
- Fire-and-forget for background tasks.
- Optimistic UI — show "processing..." with eventual update.
- Queueable Apex for orchestration.
3. Fallback paths.
When AI service is down or slow:
- Cached response (with disclaimer about staleness).
- Pre-computed values (defaults).
- Graceful degradation — UI still works without AI.
4. Idempotency.
Same input may produce different outputs. Don't make downstream logic depend on exact-match outputs.
5. Audit and review.
- Every AI decision logged.
- Sample manually review periodically.
- Track accuracy / quality metrics.
- Feedback loop into prompt improvement.
6. Human-in-the-loop.
For high-stakes decisions:
- AI suggests; human approves.
- AI auto-decides only on low-stakes.
- Override mechanism for human correction.
7. Versioning prompts.
- Prompts in Custom Metadata (or version-controlled source).
- New prompt versions A/B tested before production.
- Rollback capability.
8. RAG (Retrieval-Augmented Generation).
- Knowledge articles + Data Cloud + customer-specific data fed to LLM as context.
- Improves accuracy beyond base model knowledge.
- Architectural: indexed knowledge base; embedding + vector search; prompt augmentation.
9. Tool design.
When Agentforce calls Apex:
- Tools are well-named, well-described.
- Parameters validated.
- Error handling explicit.
- Side effects documented.
- Audit trail.
10. Monitoring.
- Latency per call.
- Error rates.
- Cost per feature.
- User satisfaction with AI output.
- Adoption / abandonment.
Common pitfalls:
- AI looking-for-problem syndrome: "let's add AI" without specific use case.
- Underestimating data prep: AI needs clean data; data work is most of the project.
- No cost monitoring: surprise bills.
- Over-trust: AI mistakes accepted as correct.
- No fallback: when AI service is down, app dead.
Senior architect insight: AI projects look glamorous; reality is mostly data engineering and prompt iteration. Most architectural decisions are about reliability, not the AI itself.
Production AI requires the same discipline as any other platform component: monitoring, fallbacks, audit, governance. Treat it as critical infrastructure, not a magic add-on.
