Career·June 6, 2026·42 min read·24 views

50 Salesforce QA Interview Questions & Answers (2026 Edition)

Test strategy, Apex coverage, Provar and Playwright, regression across the three annual releases, and how to test an Agentforce agent that isn't deterministic.

By Dipojjal Chakrabarti · Founder & Editor, Salesforce DictionaryLast updated Jun 6, 2026

Three days before go-live, the full sandbox refreshes and your whole regression suite goes red. Not because anyone touched the code. The Spring '26 preview flipped a default, and forty of your UI tests were keyed to a button that moved.

That moment is the Salesforce QA job. You're testing on a platform that updates itself three times a year, where half the logic lives in clicks instead of code, and where, as of 2026, some of the features under test are AI agents that give a slightly different answer each time you run them. The interview reflects that. Less "what is a test case," more "how do you test something non-deterministic, on a platform you don't control, without a brittle suite that cries wolf every release."

Each question has a short answer on its own line, the textbook version, then a what to actually say answer, the fuller version that shows you've shipped a release rather than only read about one. A few include the code or the worked example an interviewer wants to hear. The answers vary in length on purpose: a definition gets a sentence, a strategy question gets a paragraph.

50 QA questions across 5 sections, with behavioral and scenario rounds

Section 1 - Salesforce testing fundamentals (Q1-Q10)

This section establishes whether you understand what makes Salesforce different from a custom app. Interviewers use it to separate testers who happen to work on Salesforce from testers who understand the platform's constraints.

Q1. What makes testing Salesforce different from testing a custom web app?

Short: You don't own the platform. It's multitenant, metadata-driven, part clicks and part code, and it updates three times a year.

What to actually say: "I'm testing my org's configuration sitting on top of a platform Salesforce changes underneath me. So I test the seams: declarative automation plus Apex, governor limits at volume, and every seasonal release in a preview window. A web-app tester owns the whole stack and controls when it changes. A Salesforce tester owns the last mile and watches the platform like weather, because three times a year it shifts whether I'm ready or not."

The concrete consequence to name: I can't write a test that depends on the platform never changing, because it will. So my regression suite targets business outcomes that should stay stable, and I run it against the preview org to catch the changes that won't.

Q2. What types of testing apply to a Salesforce project?

Short: Apex unit tests, functional, integration, regression, UAT, performance and large-data-volume, security, and accessibility.

What to actually say: "Developers own Apex unit tests for coverage and logic. QA owns functional and integration across clouds, regression each release, performance at realistic data volume, security testing of sharing and field access, and increasingly AI-agent testing. UAT is the business validating against their own acceptance criteria. The mistake I watch for is teams treating Apex coverage as 'the testing' when it's only the bottom layer of the pyramid."

Each layer catches a different class of defect. Apex tests catch logic regressions cheaply, integration tests catch the field-mapping and contract bugs between systems, and UAT catches the "this technically works but it's not what we meant" gap. Naming what each layer is for shows you think in a test strategy, not a checklist.

Q3. What is shift-left testing, and how does it apply here?

Short: Test as early as possible, including during story refinement, instead of only at the end.

What to actually say: "I sit in refinement and write acceptance criteria with the business analyst before a line is built, because the cheapest place to catch a defect is in the requirement. Developers write meaningful Apex tests as they go, not the night before deploy. QA reviews the design for testability and edge cases early. Shifting left turns 'we found this in UAT and lost a sprint' into 'we caught this in refinement and it cost a conversation.'"

The practical version on a Salesforce team: I review the proposed automation in design, asking how it behaves at bulk, what happens on the error path, and which users should and shouldn't see the result. Those three questions in refinement prevent most of the defects I'd otherwise log in UAT.

Q4. Walk me through your environment and sandbox strategy.

Short: Develop in Developer sandboxes, integrate and QA in a Partial Copy, run UAT in a Full sandbox, then deploy to production.

What to actually say: "Code and config flow from Developer sandboxes through a QA environment to UAT and then production, each environment closer to production data than the last. I run functional tests in a Partial Copy with representative sample data, and UAT in a Full sandbox so the business tests against real volume and real edge-case records. The sandbox types guide lays out which tier fits which stage and why the refresh cadence matters."

The detail that shows experience: the Full sandbox is where I do performance and large-data-volume testing, because it's the only environment with production-scale data. A bug that only appears at ten million rows is invisible in a Developer sandbox with a hundred.

Refresh cadence is the other half of the strategy. A Full sandbox refreshes at most every 29 days, so I plan testing windows around it and watch for a stale UAT environment drifting away from production. Partial Copy refreshes every five days, which suits an active QA cycle. The trap is a long-lived sandbox where both config and data have drifted, so a passing test there proves nothing about production. When that happens I push for a refresh before any sign-off testing, because validating against a fictional environment is a polite way to waste a week.

Q5. How do you manage test data?

Short: A test data factory in Apex for unit tests, and seeded or sampled data (Data Loader, sandbox templates) for QA environments.

What to actually say: "Apex tests build their own data through a shared factory class so they don't depend on org data and can't be broken by someone deleting a record. For QA environments I use Partial Copy templates to pull a representative sample, then top up specific edge-case records with Data Loader: the international address, the account with ten thousand contacts, the contract that spans a fiscal boundary. Tests that depend on data someone might delete are tests that fail for the wrong reason."

A factory pattern is worth describing: one class with methods like createAccount() and createOpportunities(count) that every test calls, so when a required field is added to Account, I fix one method instead of fifty test classes. Centralized test data is the difference between a small schema change costing one edit or fifty.

Q6. How does testing declarative automation differ from testing Apex?

Short: You can't unit-test a Flow the way you unit-test Apex, so declarative logic leans more on functional and integration tests.

What to actually say: "A record-triggered Flow has no Apex test wrapping it, so I verify it by creating and updating records and asserting the outcome, ideally at bulk to catch the limit problems. Admins can change a Flow without a deploy in some setups, so I keep regression coverage on the business outcome rather than the implementation. The risk with clicks-built automation is that it changes quietly, between releases, without a pull request to review."

There is Flow testing tooling now, the built-in Flow test feature lets you define inputs and assert outputs for record-triggered Flows, and I use it where it fits. But the bulk behavior and the cross-object side effects still need functional tests that create real records and check what happened.

Q7. How do you decide what to automate versus test manually?

Short: Automate stable, high-value, repetitive paths. Test new, exploratory, or rapidly changing areas by hand.

What to actually say: "I automate the regression core: the flows that must never break and that I'll run every release, because that's where automation pays back. I keep exploratory testing and first-pass validation of new features manual, because automating something that's still changing just creates maintenance churn. My rule of thumb is that automation earns its place when a test will run unchanged across more than two releases."

The economic framing lands in interviews: an automated test has a build cost and a per-release maintenance cost, and it only pays off if it runs enough times without rework. A brittle UI test that breaks every release can cost more to maintain than running it by hand would.

A concrete example of what I keep manual: a brand-new Experience Cloud portal in its first sprint, where the layout and components change every few days. Automating it early means rewriting selectors faster than the feature stabilizes. I test it by hand until the design settles, then automate the handful of journeys that will live in regression. The lead-to-opportunity conversion that's run the same way for three years is the opposite case, and the first thing I automate, because it's stable, high-value, and I run it every release.

Q8. What is a traceability matrix, and why keep one?

Short: A mapping from requirements to test cases to defects, so you can prove coverage.

What to actually say: "It answers the question every project lead eventually asks: is every acceptance criterion tested, and what's the status of each. When a stakeholder asks 'are we safe to ship,' the matrix turns that into a defensible yes or no instead of a gut feel. It's also how I scope regression: when a requirement changes, the matrix tells me exactly which test cases to rerun, so I'm not retesting the whole app for a one-field change.

In practice the matrix lives in the test-management tool or the work tracker, linked to stories and defects, rather than in a spreadsheet someone forgets to update. I keep it light enough that updating it is part of closing a story, because a traceability matrix nobody maintains is worse than none: it gives false confidence. The payoff shows up at audit time and at release time, when "prove this requirement is tested" becomes a quick query instead of a scramble through people's memories."

Q9. How do you handle the three annual Salesforce releases?

Short: Test against the preview sandbox during the release window, run regression, and flag breaking changes before the platform auto-upgrades.

What to actually say: "Each Spring, Summer, and Winter release lands on a preview sandbox weeks before it hits production. I run my regression suite there, read the release notes specifically for deprecations and changed defaults, and file anything that breaks while there's still time to fix it. The org upgrades on Salesforce's schedule, not mine, so the preview window is the one chance to catch a platform change before it's live for users."

This is the scenario in the opening: the suite goes red not because the team changed anything, but because the platform did. A QA team without a preview-window habit discovers those changes in production.

Q10. What's your definition of done for a QA-signed story?

Short: Acceptance criteria met, functional and regression passing, defects resolved or accepted, and evidence attached.

What to actually say: "Acceptance criteria verified, no open high-severity defects, regression green for the area I touched, and test evidence linked to the story. 'It works on my screen' is not done. Done is documented, repeatable by someone else, and traceable back to the requirement. If I signed off, I can show what I tested and what the result was."

Section 2 - Apex testing & code coverage (Q11-Q20)

Even non-coding QA candidates are expected to read and reason about Apex tests in 2026, because tests are the contract the pipeline enforces. This section is where you show you understand that coverage is a floor, not a goal.

Q11. Why does Salesforce require 75% code coverage, and what's the catch?

Short: 75% org-wide is the deployment gate. The catch is that coverage measures lines run, not behavior verified.

What to actually say: "You can't deploy to production under 75% org-wide coverage, and each trigger needs some coverage. But a test that executes code with zero assertions hits the number and proves nothing. As QA I read tests for assertions, not the percentage. A codebase at 90% coverage with assertion-free tests is less safe than one at 78% with tests that actually check outcomes, and saying that out loud signals you understand what coverage does and doesn't buy."

Q12. What does a good Apex test class look like?

Short: @isTest class, @testSetup data, real assertions, bulk records, no reliance on org data.

What to actually say: "Clean data setup in @testSetup, an assertion on the actual outcome with a message, and a bulk case at 200 records to prove it survives a data load. I want to see the negative path tested too, not the happy case alone."

@isTest
private class LeadRouterTest {
    @testSetup
    static void setup() {
        insert new Account(Name = 'Acme');
    }
    @isTest
    static void routesBulkLeadsToOwner() {
        List<Lead> leads = new List<Lead>();
        for (Integer i = 0; i < 200; i++) {
            leads.add(new Lead(LastName = 'L' + i, Company = 'Acme'));
        }
        Test.startTest();
        insert leads;
        Test.stopTest();
        Integer routed = [SELECT COUNT() FROM Lead WHERE OwnerId != null];
        Assert.areEqual(200, routed, 'Every lead should be routed to an owner');
    }
}

When I review a test like this, I check three things: does it assert a specific expected value, does it run at bulk, and does it build its own data. A test missing any of those is a coverage line, not a safety net.

Q13. What is @testSetup and why use it?

Short: A method that creates test records once, shared across every test method in the class.

What to actually say: "It runs once to create the data, and each test method gets a fresh copy rolled back after it runs, so methods stay isolated but I write the setup once. It's faster than rebuilding records in every method and it keeps the data consistent across tests. The isolation matters: one method can't pollute another's data, which is what makes a failure point at the real cause instead of a side effect from the previous test."

One caveat I mention: @testSetup data is created once and rolled back after each method, but if a method commits something the framework can't roll back, it can leak across methods. For the vast majority of cases it's the right tool, and it's the first thing I look for when a test class runs slowly, because rebuilding the same records in twenty methods is pure waste.

Q14. What do Test.startTest() and Test.stopTest() do?

Short: They give the code under test a fresh set of governor limits, and force async work to finish at stopTest.

What to actually say: "I put the action I'm testing between them. They reset the governor limits so my setup data creation doesn't eat into the limits I'm measuring, and any asynchronous work queued inside, a Queueable, a future method, a batch, runs synchronously at stopTest so I can assert on its result. Testing async without them produces a green test that never actually ran the job, which is one of the most common false-confidence bugs in an Apex test suite."

The limit reset is worth making concrete. If @testSetup and the test method together insert 120 records before the code under test runs, and the action isn't wrapped in startTest, those inserts count against the same 150-DML budget the code is using, so a test can fail for setup it never meant to measure. Wrapping the real action in Test.startTest() and Test.stopTest() hands it a clean budget, so the limits I observe belong to the code under test, not to my fixtures.

Q15. How do you test a trigger?

Short: One record, a bulk batch of 200, and the negative case.

What to actually say: "Single record proves the basic logic. Two hundred records proves it's bulk-safe and doesn't hit governor limits on a data load. The negative case proves it doesn't fire when it shouldn't, which is the half people skip. A trigger tested only with one record is the classic gap that passes every test in the sandbox and then throws 'Too many SOQL queries' on the first real import. The trigger framework guide shows the code under test."

The negative case is the one I push hardest on in review. A trigger that fires correctly when it should is only half-tested; I also want proof it stays quiet when the condition isn't met, because a trigger that fires too eagerly is as much a defect as one that never fires. The bulk test and the negative test together are what turn "it works" into "it works and it doesn't misfire."

Q16. How do you test code that makes an HTTP callout?

Short: Register an HttpCalloutMock with Test.setMock() to return a canned response.

What to actually say: "Callouts aren't allowed in tests, so I supply a mock for the success path and a separate one for an error like a 500 or a timeout, then assert my code handles both. Testing only the happy path is how a partner API's bad day becomes a production incident, because the error branch was never exercised."

Test.setMock(HttpCalloutMock.class, new OrderApiMock());
Test.startTest();
OrderService.sync(orderId);
Test.stopTest();
Assert.areEqual('Synced', [SELECT Status__c FROM Order__c WHERE Id = :orderId].Status__c, 'Order should sync');

The follow-up worth pre-empting: for several callouts in one transaction I use a multi-response mock so the test mirrors the real sequence of calls, rather than returning the same canned body to every endpoint.

Q17. How should assertions be written?

Short: Use the Assert class with a message that states the expectation.

What to actually say: "Every assertion carries a message, so a failure in the CI log tells you what broke without opening the code. One logical behavior per test method keeps a failure readable. The message is the specification: it records what the code was supposed to do, which is exactly what the next engineer needs when it breaks six months from now."

Assert.areEqual(expected, actual, 'Discount should be 10% for Gold tier');

Q18. What is a test data factory, and why centralize it?

Short: A reusable class that builds valid test records, so tests don't each reinvent setup.

What to actually say: "When a required field is added to Account, I fix one factory method instead of fifty test classes. It also keeps the data realistic and consistent across the suite, so tests don't quietly diverge in what 'a valid Account' means. Decentralized test data is the top reason a small schema change cascades into a hundred broken tests, and a factory is the cheapest insurance against that."

Q19. Why is SeeAllData=true an anti-pattern?

Short: It lets tests see live org data, so they pass or fail based on data they don't control.

What to actually say: "Tests should be hermetic: they build their own data and assert against it, so they give the same result in any org and any sandbox. SeeAllData=true couples a test to whatever happens to be in the org, which makes it flaky and environment-dependent, and it can fail in a fresh scratch org that has none of the data it silently assumed. I only accept it for the rare objects you genuinely can't create in a test, and I flag every use in review."

The failure mode is concrete and embarrassing: a test with SeeAllData=true passes for years in the main org because the data it silently relied on happens to exist, then fails the moment someone runs it in a fresh scratch org or a newly refreshed sandbox where that data is gone. Now a green suite goes red for a reason that has nothing to do with the change being tested, and someone burns a morning chasing it. Hermetic tests that build their own data never have that problem, which is why I treat SeeAllData=true as a smell to justify, not a convenience to reach for.

Q20. How do you test exceptions and negative paths?

Short: Trigger the error condition and assert the exception or the user-facing message.

What to actually say: "I force the bad input, catch the expected exception, and assert on its type or message. For a validation rule that should block a save, I assert the DML failed and check the error. The negative path is where the real defects hide, because the happy path is the one everyone manually clicked through during development. A suite that only covers success is a suite that's never seen the system fail."

A concrete pattern for a validation rule: I attempt the save that should be blocked, assert that a DmlException is thrown, and assert its message contains the rule's text, so a future reword of the rule is caught. For an integration, I force the partner to return a 500 through the mock and assert my code logs it and leaves the record recoverable rather than half-updated. The error path is where production incidents actually live, so it's where I spend my assertion budget."

Section 3 - Automation frameworks & tools (Q21-Q30)

This section is where Salesforce-specific testing knowledge shows. Anyone can name Selenium; the signal is understanding why Salesforce UI automation is uniquely brittle and where to push tests instead.

Q21. Which UI automation tools work with Salesforce?

Short: Provar, Selenium, Playwright, Cypress, Tosca, and Copado Robotic Testing are the common ones.

What to actually say: "Provar is purpose-built for Salesforce and understands its metadata and page structure, so its locators survive Lightning changes better than generic ones. Selenium and Playwright are general-purpose, cheaper to staff, and more brittle on Lightning out of the box. Tosca and Copado Robotic Testing show up in larger enterprises. I pick based on the team's existing skills and how much Salesforce-specific resilience the project needs versus how much locator maintenance the team can absorb."

Q22. Why is Salesforce UI automation notoriously brittle?

Short: Dynamic element IDs, the LWC shadow DOM, and Lightning's frequent re-rendering break selectors.

What to actually say: "Lightning generates element IDs that change between renders, so a selector keyed to an ID shatters on the next load. Components live inside shadow DOM, which generic selectors can't pierce without extra work. And the three annual releases move things in the standard UI. So a suite built on generated IDs or DOM position is fragile by construction. The fix is stable, attribute-based or label-based locators and tooling that understands the component model."

The Salesforce test pyramid: Apex unit, API, UI automation, manual

The deeper point the pyramid makes: the answer to UI brittleness is partly to test less through the UI. Most logic can be verified faster and more reliably one layer down, at the API or Apex level, leaving the UI suite to cover only the genuine end-to-end journeys.

Q23. How do you write stable selectors for Lightning and LWC?

Short: Target stable attributes and labels, pierce shadow DOM deliberately, and avoid generated IDs.

What to actually say: "I locate by stable text, ARIA roles, or custom data attributes the developers agreed to keep, rather than generated IDs or XPath position. For LWC I use tooling that traverses shadow boundaries on purpose instead of fighting them. Provar's Salesforce-aware locators handle much of this, which is the main reason teams pay for it over open-source Selenium. Where I'm on Playwright, I lean on its built-in shadow-piercing and role-based locators."

A collaboration point worth raising: the most durable fix is asking developers to add stable data-* hooks to the components QA automates, which turns a brittle test into a stable one and costs the developer almost nothing.

Q24. Provar versus Selenium for Salesforce?

Short: Provar is Salesforce-aware and lower-maintenance. Selenium is free, flexible, and more brittle.

What to actually say: "Provar costs money but understands page layouts, field types, and metadata, so it survives Lightning changes with far less rework, and non-developers can build tests in it. Selenium is open and infinitely flexible, but you build all that Salesforce resilience yourself and maintain it. For a Salesforce-heavy shop running every seasonal release, Provar usually wins once you count the maintenance hours Selenium quietly consumes. For a team with strong engineers and a small Salesforce surface, Selenium or Playwright can be the right call."

Q25. How do you test at the API layer?

Short: Hit the REST and SOAP APIs directly with Postman or an automated suite, asserting on responses.

What to actually say: "API tests are faster and far less brittle than UI tests, so I push validation down there wherever the logic allows. I test the REST endpoints, composite requests, and any custom Apex REST services with Postman collections or a code-based suite running in the pipeline. The UI test then only has to prove the screen wires up, not re-verify the business logic, which keeps the slow, fragile UI layer thin."

Concretely, for a custom Apex REST service that returns an order summary, I write a Postman collection that posts a known order ID and asserts the JSON shape, the totals, and the error response for a bad ID, all without opening a browser. A composite request lets me create a parent and its children in one call and assert the whole graph came back right. These run in seconds in the pipeline and don't break when a button moves, which is exactly why I verify the logic here rather than through a UI test that has to log in and click."

Q26. What is data-driven testing, and how do you apply it?

Short: Drive one test with many input rows from an external data source.

What to actually say: "Instead of ten near-identical test cases, I parameterize one test with a table of inputs and expected outputs. It's how I cover pricing tiers, country-specific tax rules, or record-type variations without copy-pasting cases. Adding a new scenario becomes a data change, not a code change, which keeps the suite small and the coverage broad. It also makes the gaps visible, because the data table is a readable list of what's covered."

A concrete case: a discount engine with five customer tiers and three regions has fifteen combinations, and I'd never hand-write fifteen near-identical test methods. I put the inputs and expected discounts in a data table and run one parameterized test across all fifteen rows. When the business adds a tier, I add a row, not a method. The table doubles as documentation the business can read and confirm, which turns a test artifact into a shared source of truth about the rules.

Q27. How do you test integrations and middleware like MuleSoft?

Short: Test each system in isolation with mocks, then end to end with real connections in a staging environment.

What to actually say: "I verify the Salesforce side with mocked responses, verify the middleware mapping independently, then run an end-to-end test through MuleSoft against a sandbox to catch the integration-only bugs: field mapping, retries, idempotency, and error handling. Most integration defects live in the gaps between systems, not inside any one of them, so the end-to-end test against real connections is the one that finds the expensive bugs."

A specific thing I test: what happens when the same message arrives twice, because networks retry. If the integration isn't idempotent, a duplicate message creates a duplicate record, and that's a data-integrity defect that only a deliberately-repeated test will surface.

Q28. How do you run tests across the three release sandboxes?

Short: Maintain the suite once, run it against the current org and the preview org each release.

What to actually say: "The same suite runs against my normal QA sandbox and against the seasonal preview sandbox. The preview run is the early-warning system: anything that breaks there is a platform change I need to handle before the org auto-upgrades. I version the suite so a preview-only fix doesn't leak into the current branch before the release actually lands, which keeps the two timelines from contaminating each other."

The discipline that makes this work is treating the preview run as a tagged branch of the suite for the upcoming release, so a fix for Spring '26 behavior doesn't accidentally ship into the current production line before Spring '26 is live. When the release goes live for everyone, that branch becomes the mainline. Teams that skip this end up with tests that pass in preview and fail in production, with no obvious reason why."

Q29. Where does UI automation fit versus API and Apex tests?

Short: At the top of the pyramid: few, high-value, end-to-end journeys. Most checks belong lower, in API and Apex tests.

What to actually say: "The test pyramid holds on Salesforce. Apex unit tests are the broad, cheap, fast base. API and integration tests are the middle. UI automation is the thin top, reserved for the critical end-to-end journeys a user actually performs. Teams that invert it, automating everything through the UI, end up with a slow suite that breaks every release and that everyone eventually ignores, which is worse than having fewer, reliable tests."

Q30. How do you keep an automation suite maintainable as the org changes?

Short: Stable locators, a page-object or keyword structure, shared test data, and ruthless pruning.

What to actually say: "I centralize locators and reusable steps so a UI change is a one-line fix in one place, keep tests independent so one failure doesn't cascade, and delete tests that no longer earn their maintenance. An automation suite is a codebase and rots like one if neglected. The hardest discipline is deleting tests: a suite full of flaky, ignored tests gives false confidence, and pruning it down to reliable ones is an improvement even though the test count drops."

Section 4 - Functional, regression & UAT (Q31-Q40)

This is the craft of QA: writing cases, scoping regression intelligently, and running UAT so the business actually validates the solution. The depth here is in risk-based thinking.

Q31. What makes a good test case?

Short: Clear preconditions, specific steps, one expected result, and enough detail that someone else can run it.

What to actually say: "A good case has a single clear objective, the exact data and user context it needs, unambiguous steps, and one expected result. Vague cases like 'check the flow works' produce vague results and can't be handed to another tester. I write them so a new team member gets the same outcome I would, because a test case that only I can run isn't a test case, it's a memory."

The difference is concrete. "Verify the approval flow works" is a memory: the next tester doesn't know which record, which approver, or what "works" means. "As the EMEA sales manager, submit an opportunity over 50,000 for approval and confirm it routes to the VP, locks the record, and emails the submitter" is a case anyone can run and get the same result from. The specifics, the user, the data, and the expected outcome are what make it repeatable instead of personal.

Q32. What's your regression strategy across seasonal releases?

Short: A maintained regression suite of the must-not-break paths, run on the preview sandbox and before every deploy.

What to actually say: "I keep a regression pack covering the critical business flows, automated where they're stable enough to pay back. It runs before each production deploy and against every seasonal preview. I expand it whenever a defect escapes to production, so the suite grows toward where the real risk has proven to be, rather than toward where I guessed it would be. That feedback loop is what makes regression coverage sharpen over time instead of bloating."

Q33. How do you scope regression for a given release?

Short: Risk-based: full coverage of changed areas, smoke coverage of the rest.

What to actually say: "I use the traceability matrix and the change set to find exactly what's touched, run deep regression there, and a lighter smoke pass everywhere else. Re-running the entire suite for a one-field change wastes the release window and trains the team to start regression too late. Risk-based scoping is how I finish regression before go-live instead of after, and it's a more defensible answer to 'did you test enough' than 'I ran everything.'"

Q34. How do you run and coordinate UAT?

Short: Business users test against acceptance criteria in a Full sandbox, with QA supporting and logging defects.

What to actually say: "I prepare scripts mapped to the acceptance criteria, seed realistic data so users test against situations they recognize, train the testers, then support them daily and triage what they find. UAT is the business confirming the solution solves their problem, not QA re-running functional tests through other people. My job during UAT is to make their path smooth, capture issues cleanly, and protect them from environment problems that aren't really defects."

Q35. Walk me through the defect lifecycle.

Short: New, triaged, assigned, fixed, retested, closed (or reopened).

What to actually say: "A defect is logged with reproduction steps, evidence, and environment, triaged for severity and priority, assigned to a developer, fixed, then I retest in the same environment and either close it or reopen with notes on what's still wrong. A defect report without reproduction steps isn't a defect report, it's a complaint, and the first thing I do with a vague one is reproduce it so the developer isn't guessing."

I also record the environment and the exact data on every defect, because a bug that reproduces in the Full sandbox but not in Partial is usually a data or config difference, and that detail saves the developer an hour of confusion. When I retest a fix, I retest in the same environment with the same steps that produced it, then I check around it for anything the fix might have disturbed, because a fix that introduces a new bug is a common and avoidable way for defects to bounce between resolved and reopened."

Q36. How do you prioritize and triage defects?

Short: Severity is impact; priority is urgency. They're independent.

What to actually say: "Severity measures how bad the defect is if it happens, priority measures how soon we must fix it, and they're independent axes. A typo on the login page is low severity but can be high priority because everyone sees it. A rare data-corruption bug is high severity even at low frequency. I own the severity call as QA and let the product owner set priority, and keeping the two separate is what stops triage meetings from going in circles."

Q37. How do you test reports, dashboards, and visibility?

Short: Test as different users, because sharing means different people legitimately see different numbers.

What to actually say: "I verify the report logic with a known data set first, then log in as users in different roles to confirm each sees the rows they should and no more. A 'wrong number' on a report is usually a sharing question, not a math question. Testing only as a System Administrator hides every visibility bug, because the admin sees everything, so I always test reports through the eyes of a regular user in each relevant role.

A concrete repro of the classic ticket: a sales rep reports that the pipeline dashboard shows a smaller number than their manager sees. Before touching the report, I confirm both use the same filters, then I "login as" each and compare the row counts. Nine times out of ten the rep simply can't see opportunities owned by peers, which is the sharing model working as designed, not a defect. Showing the stakeholder the two row counts side by side ends the "the report is broken" conversation fast."

Q38. How do you test sharing and field-level security?

Short: Log in as representative users in each profile and confirm record and field access match the design.

What to actually say: "I build a matrix of profiles and permission sets against the records and fields they should and shouldn't access, then verify it with real user logins or 'login as.' Security testing is its own discipline on Salesforce because the access model is layered, object, then field, then record, and a gap in any layer is a leak. A quiet over-share, where a user can see records they shouldn't, is the most expensive defect to find late, often in an audit."

Q39. How do you validate a data migration?

Short: Reconcile counts, spot-check field mapping, verify relationships, and test the migrated data against the app.

What to actually say: "Row counts in versus out, sampled field-level comparison against the source, relationship integrity so there are no orphaned children, and a functional pass on migrated records to confirm the automation behaves on real data. Then I check for duplicates and required-field gaps. Migration defects are silent until a user opens the wrong record, so reconciliation isn't optional, it's the difference between finding the problem in testing or in a customer complaint."

Q40. How do you sign off a release to production?

Short: Acceptance criteria met, regression green, no open high-severity defects, UAT approved, and a documented go decision.

What to actually say: "I confirm the traceability matrix is complete, regression and UAT passed, and open defects are either resolved or formally accepted with a documented workaround. Then I write a sign-off summarizing the residual risk. Sign-off is a decision backed by evidence, not a rubber stamp, and if I'm not comfortable, I say so in writing before the deploy, not after. The written risk statement protects everyone, including me, when something surfaces later."

Section 5 - CI/CD, performance, security & AI testing (Q41-Q50)

This is the modern QA frontier: tests in the pipeline, performance at scale, and the genuinely new problem of testing non-deterministic AI agents. Strong candidates have an answer for the AI questions, because most don't.

Q41. How do Apex tests run in a CI/CD pipeline?

Short: The deployment runs the org's Apex tests, and the pipeline fails if they fail or coverage drops below 75%.

What to actually say: "On deploy, the platform runs the specified Apex tests, and the pipeline gates on the results and on coverage. With Salesforce DX and a tool on top, I run tests on every pull request against a scratch org, so a broken test blocks the merge rather than the release. That shift, from testing at release time to testing at merge time, is what keeps a broken change from ever reaching the integration branch. The DevOps tooling comparison covers where this runs."

Q42. What is static code analysis, and where does QA care?

Short: Tools like PMD and the Salesforce Code Analyzer scan code for bug patterns and security issues without running it.

What to actually say: "It catches SOQL in loops, missing CRUD and FLS checks, hardcoded IDs, and known anti-patterns before any test runs. As QA I care because a clean static scan in the pipeline prevents a whole class of defects I'd otherwise find by hand, and it catches security gaps that functional testing easily misses. It's cheap, automated coverage of the boring, dangerous stuff, and I treat a failing scan like a failing test."

Q43. How do you do performance and large-data-volume testing?

Short: Load representative data volumes into a Full sandbox and measure page loads, queries, and batch jobs against it.

What to actually say: "I seed millions of records to mirror production, then measure list views, reports, SOQL response times, and batch job durations against that volume. The bugs that only appear at scale, non-selective queries, sharing recalculation lag, row-lock contention, never show up against a hundred test records. Performance testing on tiny data is theater, so the Full sandbox with production-scale data is non-negotiable for any org that's actually large.

Specifically, I measure list-view and report load times, the duration of key batch jobs, and SOQL response on the heaviest objects, then compare against production-like volume. The symptoms I hunt for are the ones that only appear at scale: a query that goes non-selective past a million rows, sharing recalculation that runs for hours after a role change, and row-lock contention when an integration updates the same parent from parallel threads. None of those reproduce on a hundred test records, which is the whole reason the measurement has to happen on a Full sandbox loaded to real volume."

Q44. How do you test security?

Short: Verify profiles, permission sets, sharing, and field-level security match the access design, for each persona.

What to actually say: "I test least privilege: each persona can do exactly what it should and nothing more. That means object CRUD, field-level security, record sharing, and feature access checked per profile and permission set. I also confirm that Apex behind the UI enforces access rather than assuming the platform does, because a controller running in system mode can leak a field the page hides. An over-permissioned profile is a breach waiting for an audit to find it."

Q45. What's different about testing an Agentforce agent versus deterministic code?

Short: The agent's output varies between runs, so you test behavior and boundaries, not an exact string.

What to actually say: "Deterministic code returns the same output for the same input, so I can assert an exact value. An agent might phrase its answer differently each run, or occasionally choose a different path, so asserting an exact sentence is the wrong model. Instead I test whether it routed to the correct topic, called the right action with the right inputs, stayed grounded in the permitted data, and refused what it should refuse. I test the behavior envelope and measure a pass rate across many runs, rather than expecting one deterministic pass."

This is the question most QA candidates can't answer well in 2026, so a clear answer is a strong differentiator. The mental shift is from "does it equal X" to "does it behave correctly often enough, and fail safely when it fails."

Q46. What is the Agentforce Testing Center, and how do you use it?

Short: Salesforce's tool for batch-testing agents against a set of test utterances and expected outcomes.

What to actually say: "I load a set of representative user inputs, each with the expected topic and action, run them as a batch, and review where the agent diverged from the expectation. It turns 'I tried a few prompts and it seemed fine' into a repeatable evaluation I can rerun after every change to the agent, its instructions, or its grounding. The Testing Center guide walks through building the test set, which is the real work."

Q47. How do you build a good eval set for an agent?

Short: Representative utterances, including paraphrases, edge cases, and things the agent should refuse, each with an expected outcome.

What to actually say: "I cover the common requests, the messy real-world phrasings of those same requests, the off-topic and adversarial inputs, and the cases where the correct behavior is to decline or escalate. Then I expand the set from real conversation logs once the agent is live, because production surfaces phrasings I'd never invent. A set of ten happy-path questions tells me nothing about how the agent fails, and how it fails is what I most need to know before it talks to a customer."

I size the set to the risk: a low-stakes internal helper might get thirty utterances, while a customer-facing service agent gets hundreds, including the rude, the confused, and the deliberately adversarial inputs real users send. I also keep a regression slice of every prompt that ever produced a bad answer in production, so a fix for one failure can't quietly reintroduce another. The eval set is a living asset that grows with the agent, not a one-time checklist."

Q48. How do you test the Einstein Trust Layer behaviors?

Short: Verify PII masking, grounding on permitted data only, and that the agent respects sharing and field security.

What to actually say: "I feed inputs containing PII and confirm it's masked before the prompt leaves the org, check that the agent only retrieves data the running user is allowed to see, and confirm toxic or out-of-scope requests are handled rather than answered. The Trust Layer enforces a lot automatically, but the write actions behind an agent are still my responsibility to test for over-access, because an action running with too much permission is a leak the Trust Layer won't catch for me."

A concrete test: I send the agent a message containing a credit-card number and a personal email, then inspect the prompt that actually left the org to confirm both were masked before reaching the model. I run the agent as a low-privilege user and confirm it can't surface records that user has no access to. And I check that any action which writes data enforces field-level security, because the Trust Layer guards the model boundary, while a sloppily written action can still read or write past what the running user is allowed.

Q49. How do you prevent flaky tests?

Short: Deterministic data, isolated tests, explicit waits instead of sleeps, and no shared mutable state.

What to actually say: "Flaky tests are worse than no tests, because they train the team to ignore red, and then a real failure gets ignored too. I make each test build and tear down its own data, wait on a condition rather than a fixed timer, and avoid order dependencies between tests. For agents, which are inherently non-deterministic, I don't pretend a single run is reliable, I run the eval enough times to measure a pass rate and set a threshold, so a one-off variation doesn't fail the build."

Q50. What metrics do you report as QA?

Short: Coverage, pass rate, defect density, defect escape rate, and automation coverage of regression.

What to actually say: "The metric I watch hardest is escape rate: the defects that reached production. It's the honest measure of whether testing actually worked, because everything else measures effort. I pair it with regression automation percentage and open-defect aging to show trend and risk. Vanity metrics like raw test count tell you how busy QA was, not whether the product is good, and I steer stakeholders toward escape rate because it's the one that maps to customer pain."

I present escape rate as a trend across releases, not a single number, because the direction matters more than the absolute value. A team going from eight escapes a quarter to two is improving even if two isn't zero. I pair the trend with a short note on what the escaped defects had in common, because that pattern is where the next process fix comes from. Stakeholders engage with "escapes are down, and here's the common cause we're addressing" far more than with a raw count of tests executed, which tells them nothing about whether the product got better."

Behavioral questions (5 patterns)

Behavioral rounds for QA probe judgment and backbone: will you hold the line on quality, and can you do it without becoming the team's blocker.

B1. "Tell me about a critical bug that escaped to production."

What they want: ownership and a process improvement, not a confession.

What to say: name a real escape, explain how it was found in production, the immediate fix, and the regression test or process change you added so that class of bug can't escape again. "We added it to the regression pack and a static-analysis rule caught the pattern after that" beats "we were more careful." The escape is the setup; the prevention is the point.

B2. "Describe pushing back when a developer or PM wanted to skip testing."

What they want: backbone with pragmatism.

What to say: show how you quantified the risk in terms they cared about ("this touches billing, which has an escape history"), offered a scoped option ("smoke the rest, full-test billing"), and documented the decision. Avoid both "I just approved it under pressure" and "I refused and blocked the release." The senior move is making the risk visible and letting the owner decide with eyes open.

B3. "Tell me about automating a painful manual regression."

What they want: initiative and return-on-investment thinking.

What to say: pick a slow, repetitive suite, explain how you automated the stable core, and quantify the time saved per release. Mention what you deliberately left manual and why, because knowing what not to automate is as much a signal as knowing what to automate.

B4. "How do you keep up with Salesforce releases as QA?"

What they want: evidence you treat releases as a managed risk.

What to say: preview-sandbox regression runs each season, reading the release notes specifically for deprecations and changed defaults, the Trailblazer Community, and a habit of filing platform-change defects early in the window. Name a recent release change you caught in preview before it hit production.

B5. "Describe a disagreement over a defect's severity or priority."

What they want: judgment and collaboration.

What to say: show you separated severity (your call, based on impact) from priority (the product owner's call, based on urgency), brought data rather than opinion, and reached a decision without it turning personal. Bonus if you mention revisiting the call when new information arrived, which shows you hold positions loosely when the facts change.

Scenario questions (5 patterns)

Scenario questions test how you'd actually work a problem. Walk through the steps out loud and state what you'd check first.

S1. "A Flow works for one record but fails on a 200-record import. How do you catch this earlier?"

What they want: bulk-testing instinct. Steps: add a bulk functional test that imports 200 records and asserts the outcome, review the automation for per-record queries or DML, add the bulk case to the regression pack, and push a standard that every automation gets a bulk test, beyond the single-record happy path. The root cause is almost always a query or DML inside a loop that crosses a governor limit at volume.

S2. "The Spring '26 preview sandbox breaks a key automation. Walk me through it."

What they want: release-management process. Steps: reproduce it in the preview org, read the release notes to find the changed default or deprecation behind it, log a defect with the production-upgrade deadline attached so it gets prioritized, work with the developer or admin on the fix, and rerun regression in preview before the org auto-upgrades. The deadline is the lever that turns a preview finding into action.

S3. "An Agentforce agent gives a wrong answer roughly 1 in 20 times. How do you test and triage it?"

What they want: non-deterministic testing maturity. Steps: build an eval set in the Testing Center covering the failing pattern, run it enough times to measure the real failure rate rather than guessing from one bad run, inspect the traces to find whether it's topic routing, a bad action input, or a grounding gap, fix the most likely cause (usually the action description or the grounding data), and rerun the eval to confirm the rate dropped. The mindset is statistical, not binary.

S4. "Two users see different totals on the same report. Is it a bug?"

What they want: visibility reasoning. Steps: confirm both are running the same report with the same filters, then check sharing, role hierarchy, and field-level security, because different users legitimately see different rows. Reproduce as each user, and only escalate it as a data or formula bug once sharing is ruled out. Most "wrong report" tickets are sharing working exactly as designed.

S5. "A deployment failed because org coverage dropped below 75%. What do you do?"

What they want: pipeline literacy. Steps: identify which new or changed class lowered the coverage, confirm the missing tests would be meaningful with real assertions rather than filler, get proper tests written for the uncovered logic, rerun the suite in the pipeline, and treat the gap as a process miss to prevent, not a number to top up with empty tests. Topping up coverage with assertion-free tests passes the gate and fails the purpose.

Red flags interviewers watch for

The patterns that lose QA offers, regardless of how long your tool list is:

Five red flags that lose Salesforce QA interview offers

Treating 75% coverage as the goal. Coverage theater, tests with no assertions, signals you measure effort instead of quality. Talk about asserting outcomes before you mention the percentage.
Manual-only thinking. No instinct for API or Apex-level testing, wanting to automate everything through a brittle UI. The pyramid is the expected mental model in 2026.
Testing only as System Administrator. Misses every sharing, role, and field-security defect, which are the most expensive ones to find late. Always test through a regular user's eyes.
Ignoring bulk and data volume. Testing one record and declaring victory, when the import breaks it at 200 and the report times out at ten million.
Testing AI like deterministic code. Asserting an exact agent response, or "I tried a few prompts," instead of building an eval set and measuring a pass rate. This is the fastest way to look out of date in a 2026 QA loop.

How to prep

Day 1: read this list cover to cover and mark your weakest section. For most QA candidates it's Apex testing or AI testing.
Day 2: open a scratch org and write two Apex tests with real assertions and a bulk case, so you can talk about coverage from experience rather than theory.
Day 3: build a small API test collection in Postman and one UI automation flow, then deliberately break a selector and fix it, so the brittleness question is something you've felt.
Day 4: practice scoping a risk-based regression and triaging a defect's severity versus priority, out loud, because the verbal articulation is what the interview tests.
Day 5: review AI testing: build a tiny eval set for an agent and run it twice to see the variance for yourself, which makes the non-determinism real instead of abstract.

What to read next

Salesforce Sandbox Types Explained - the environment strategy behind Section 1.
Agentforce Testing Center Guide - how to test agents, in depth.
DevOps Tools Compared - where your tests run in the pipeline.
50 Salesforce Developer Interview Questions - the code your tests have to cover.
50 Salesforce Admin Interview Questions - the configuration you regression-test every release.
The full 2026 interview series: Admin, Developer, Consultant, and Architect.

Pick the section you were weakest on, open a sandbox tonight, and either write a test with a real assertion or build a five-question eval set for an agent. QA interviews reward the candidate who has felt a test go flaky and fixed it, not the one who can only name tools.

About the Author

Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.

Share this article

Share on X LinkedIn

Sources

Related dictionary terms

Comments

No comments yet. Start the conversation.

Keep reading

Salesforce Sandbox Types: Developer, Developer Pro, Partial Copy, Full

DevOps·May 15, 2026·11 min read

Salesforce Sandbox Types Explained: Developer, Developer Pro, Partial Copy & Full

Four sandbox types, four use cases. Here is the 2026 reference: storage, refresh, cost, license. Plus the templates, masking, and refresh strategies that keep environments healthy.

Apex Trigger Framework: best practices for bulk-safe, scalable triggers in 2026

Development·May 15, 2026·12 min read

The Apex Trigger Framework: Best Practices for Bulk-Safe, Scalable Triggers (2026)

The complete 2026 trigger framework guide. Logic-less triggers, bulk safety, recursion control, framework comparison (Kevin O'Hara vs interface vs virtual), and CRUD/FLS enforcement.

Salesforce DevOps tools comparison - DevOps Center, Gearset, Copado, AutoRabit

DevOps·May 4, 2026·14 min read·293

Salesforce DevOps Tools Compared: DevOps Center vs Gearset vs Copado vs AutoRabit (2026)

DevOps Center, Gearset, Copado, and AutoRabit all ship Salesforce metadata. They're not interchangeable. Feature matrix, pricing, decision tree, and the migration paths between them.

Agentforce Testing Center complete 2026 guide to AI agent validation

Agentforce·May 23, 2026·11 min read·61

Agentforce Testing Center: The Complete Guide to Testing Your AI Agents in 2026

Testing Center is the Agentforce tool for validating AI agents before they ship. Synthetic test users, conversation-level checks, custom evaluations, and DevOps quality gates. Here's how to use all of it.