What does the Salesforce error "Sandbox refresh failed / Sandbox is in Copying status" mean and how do I fix it?

Platform

Sandbox refresh failed / Sandbox is in Copying status

A sandbox refresh job failed mid-process or is stuck in Copying. Common causes: refresh interval not yet met, source production org changed during the copy, or platform infrastructure issue. Check status, wait if needed, file a support case for stuck jobs.

Also seen asSandbox refresh failed·Sandbox is in Copying·sandbox refresh stuck·sandbox refresh limit

A release manager kicks off a sandbox refresh on Friday afternoon so the test team has a clean copy on Monday. The refresh job sits in "Copying" status for forty-eight hours, then either fails with an opaque error message or stays stuck indefinitely. The release window starts in two days and the team cannot deploy until that sandbox is healthy again.

What the platform is checking

A sandbox refresh copies the metadata of the production org into a sibling org. Full sandboxes also copy the data. Partial sandboxes copy a sampled subset. The copy is an orchestrated operation that touches storage, compute, and metadata services across the Salesforce infrastructure.

The platform reports the status through Setup, the Sandbox Refresh page, and the SandboxInfo Tooling API object. When a refresh enters the Copying state it stays there until every step finishes or until a step fails. A failure can show up as a status of "Activation Failed", a generic refresh error, or an indefinite stall in Copying.

The cause is rarely a single thing. The most common patterns are large data volumes that exceed the allotted copy window, post-copy Apex scripts that throw uncaught exceptions, IP allowlists or session policies that block the refresh service, and metadata that references components no longer available in the source.

The refresh is not interactive. Once it starts, the running user cannot pause or inspect intermediate state. The platform owns the operation. Operators get visibility through status, support cases, and the occasional debug log.

The broken example

A team configures a SandboxPostCopy class that runs after every refresh. The class seeds test users, deactivates production integrations, and resets feature flags.

global class SandboxPostCopyHandler implements SandboxPostCopy {
    global void runApexClass(SandboxContext context) {
        User u = [SELECT Id, Email FROM User WHERE Username = 'integration@acme.com'];
        u.IsActive = false;
        update u;

        List<Opportunity> opps = [SELECT Id FROM Opportunity];
        for (Opportunity o : opps) {
            o.StageName = 'Sandbox Stage';
        }
        update opps;
    }
}

Three problems compound. The query for the integration user assumes the record exists in the source, but a recent cleanup removed it. The SOQL on Opportunity returns every record without a LIMIT, blowing past governor budgets in a full sandbox with a million Opportunities. The bulk DML hits a row-lock contention because parallel async work also touches Opportunity.

The script throws somewhere in the middle. The platform marks the refresh as Activation Failed and surfaces a terse message in the Sandbox Refresh page. The sandbox exists but is unusable: half the records have the new stage, the user is still active, and the post-copy state is inconsistent.

What actually fails

Three categories cover most production failures.

Post-copy script exceptions. The SandboxPostCopy interface runs once per refresh. If the class throws an unhandled exception, the platform records the refresh as failed. Common culprits are missing seed records, governor-limit overruns, and assumptions about org configuration that change between refreshes.

Capacity and timing. Full sandboxes copy production data. Orgs with very high data volumes (hundreds of millions of records, large attachments, dense ContentDocument footprints) take longer. Salesforce schedules refreshes with concurrency and resource caps. A refresh queued behind other large operations can wait a long time before its window opens.

Source-org instability. If production is undergoing a major release, an instance migration, or a high-load incident, the platform throttles non-critical operations including sandbox copies. The refresh status remains Copying while the platform waits for capacity.

The fix, three paths

Fix the post-copy script. Treat the SandboxPostCopy class as production code. Guard every query against empty results. Add LIMIT clauses and batch the work over Database.executeBatch when the data volume is large. Wrap the body in try/catch and log the failure to a custom object so the next refresh has diagnostic context.

global class SandboxPostCopyHandler implements SandboxPostCopy {
    global void runApexClass(SandboxContext context) {
        try {
            disableIntegrationUser();
            scheduleOpportunityRefresh();
        } catch (Exception e) {
            insert new Sandbox_Refresh_Log__c(
                Status__c = 'Error',
                Message__c = e.getMessage().left(255),
                Stack__c = e.getStackTraceString().left(32000)
            );
        }
    }

    private void disableIntegrationUser() {
        List<User> users = [
            SELECT Id, IsActive
            FROM User
            WHERE Username = 'integration@acme.com.sandbox'
            LIMIT 1
        ];
        if (!users.isEmpty() && users[0].IsActive) {
            users[0].IsActive = false;
            update users[0];
        }
    }

    private void scheduleOpportunityRefresh() {
        Database.executeBatch(new OpportunityStageReseedBatch(), 200);
    }
}

The handler validates that the user record exists in the source. The Opportunity reseed is delegated to a batch job, which spreads the DML across many transactions and avoids governor budgets entirely.

Open a support case for a stuck refresh. When a refresh is stuck in Copying for more than the documented window (typically 24 hours for partial copy, longer for full copy depending on data volume), open a case with the org id, sandbox name, and time of refresh. Salesforce support has visibility into the copy job that admins do not. Most stuck refreshes resolve once support kicks the job or reschedules it on a different copy worker.

Retry the refresh. Once the post-copy script is fixed and committed to production, queue a new refresh. The retry runs against the corrected metadata. A sandbox in Activation Failed state can sometimes be activated manually from the Sandbox detail page; if not, a fresh refresh always works.

The fixed example

A SandboxPostCopy class with defensive guards, log persistence, and async hand-off for large work:

global class SandboxPostCopyHandler implements SandboxPostCopy {
    global void runApexClass(SandboxContext context) {
        String sandboxName = context.sandboxName();
        Sandbox_Refresh_Log__c log = new Sandbox_Refresh_Log__c(
            Sandbox_Name__c = sandboxName,
            Started_At__c = System.now()
        );
        try {
            seedTestUsers();
            disableProductionIntegrations();
            resetFeatureFlags();
            log.Status__c = 'Success';
        } catch (Exception e) {
            log.Status__c = 'Error';
            log.Message__c = e.getMessage().left(255);
            log.Stack__c = e.getStackTraceString().left(32000);
        }
        log.Finished_At__c = System.now();
        insert log;
    }

    private void seedTestUsers() {
        if ([SELECT COUNT() FROM User WHERE Profile.Name = 'QA Tester'] > 0) {
            return;
        }
        TestUserFactory.createBatch(5);
    }

    private void disableProductionIntegrations() {
        List<User> integrations = [
            SELECT Id, IsActive
            FROM User
            WHERE Profile.Name = 'Integration User' AND IsActive = true
            LIMIT 50
        ];
        for (User u : integrations) {
            u.IsActive = false;
        }
        if (!integrations.isEmpty()) {
            update integrations;
        }
    }

    private void resetFeatureFlags() {
        Database.executeBatch(new FeatureFlagReseedBatch(), 200);
    }
}

The class records every refresh attempt to a custom object. Subsequent investigations have a paper trail of what ran, when it ran, and what failed.

Edge cases and gotchas

Sandbox name conflicts. Refreshing a sandbox that shares a name with an org that was recently deleted can fail because the platform retains the name for a cooldown period. The fix is to wait the cooldown out or to choose a different sandbox name on the next create.

Outbound network calls. A post-copy script that calls an external service via callout will fail in sandbox if the callout endpoint is not in Remote Site Settings or Named Credentials. The metadata may differ between production and sandbox because Remote Site Settings are not always copied identically. Verify the endpoints are present in the sandbox before the refresh.

Encrypted fields. Shield Platform Encryption uses tenant-specific keys. A full sandbox copies the encrypted ciphertext but uses sandbox keys, so encrypted field values do not round-trip without explicit key management. Plan the key export and import as part of the refresh runbook.

Async jobs in flight. A sandbox refresh started while batch jobs are running on the source can interact in unexpected ways. The copy snapshots the data, but jobs that complete mid-copy may have inconsistent state in the target. Quiesce long-running jobs in production before kicking off a critical refresh.

Custom metadata vs custom settings. Custom metadata records are copied as part of the metadata layer. Hierarchy custom settings are copied. List custom settings are copied as data. If your post-copy script depends on a particular custom setting being present, verify which type it is and confirm it survived the copy.

Defensive habits

Treat the post-copy script as a production deploy. Write tests that exercise the class against a representative org state. Run those tests on every deploy that touches the class. Catching a regression in CI is cheaper than catching it after a 6-hour failed refresh.

Keep a Sandbox_Refresh_Log__c (or similar) custom object that captures every post-copy run. The log object accumulates a history that is invaluable when a refresh fails six months from now and the original engineer has moved on.

Document the expected refresh duration for each sandbox tier (Developer, Developer Pro, Partial Copy, Full). When a refresh runs longer than the documented window, that is the signal to escalate rather than wait. Including the documented bounds in the release runbook prevents the team from assuming a stuck refresh is normal.

For full sandboxes with high data volumes, schedule refreshes during off-peak hours and avoid running them in parallel with other large operations. Two simultaneous full-copy refreshes contend for the same compute capacity and either or both can be delayed.

Test patterns

Unit-test the post-copy class with a mock SandboxContext:

@isTest
static void postCopyHandlesMissingUsers() {
    SandboxPostCopyHandler handler = new SandboxPostCopyHandler();
    SandboxContext ctx = new SandboxContextMock('UnitTestSandbox');

    Test.startTest();
    handler.runApexClass(ctx);
    Test.stopTest();

    Sandbox_Refresh_Log__c log = [
        SELECT Status__c FROM Sandbox_Refresh_Log__c
        WHERE Sandbox_Name__c = 'UnitTestSandbox' LIMIT 1
    ];
    System.assertEquals('Success', log.Status__c);
}

The test confirms the handler completes even when expected users are missing. A second test should deliberately trigger an exception to verify the error path writes a meaningful log entry.

Diagnosing a stuck refresh

When the refresh sits in Copying for an unusual length of time:

Open Setup, Sandboxes, and note the start time and current status. Compare against the documented duration window.
Check the production org's instance status on trust.salesforce.com for any ongoing incidents or scheduled maintenance.
Query the Tooling API for the SandboxInfo and SandboxProcess records to get the granular status.
Open a support case if the refresh exceeds twice the documented duration with no progress.
Capture the case number and update internal tickets so the team knows the escalation is in flight.

Most refreshes resolve once the upstream constraint clears. The runbook should specify who owns the case and what notifications go out at each stage.

Communicating with stakeholders during a failure

A stuck or failed refresh is rarely a silent problem. The test team is waiting for the sandbox, release plans depend on it, and individual contributors may be blocked. Keep stakeholders informed with concrete timestamps and next actions rather than vague reassurances. A short note that says "refresh stuck in Copying since Friday 6 PM, support case 12345678 open, ETA from support is 4 hours" is more useful than "we're working on it".

Quick recovery checklist

Identify whether the failure is post-copy script, capacity, or source instability.
For script failures, read the exception, fix the script, deploy to production, retry the refresh.
For capacity stalls, open a support case with org id and sandbox name.
For source instability, wait the incident out, then retry.
Document the resolution and update the runbook.

Sandbox refresh issues are visible and disruptive, but the recovery paths are well-understood once the failure mode is identified.

Related dictionary terms

Share this fix

Share on LinkedIn Share on X