Semi-Join
A semi-join is a SOQL query pattern that filters records in one object using a subquery against a related object placed inside an IN clause.
Definition
A semi-join is a SOQL query pattern that filters records in one object using a subquery against a related object placed inside an IN clause. The outer query keeps only the rows whose ID or foreign key field matches an ID returned by the inner subquery, so you fetch parents that have qualifying children without pulling the children themselves.
A common example is selecting accounts that have at least one closed-lost opportunity: SELECT Id, Name FROM Account WHERE Id IN (SELECT AccountId FROM Opportunity WHERE StageName = 'Closed Lost'). The database resolves the relationship in a single round trip. Its mirror image, the anti-join, swaps IN for NOT IN to return records that have no matching related rows.
How semi-joins filter parents by their children
The anatomy of a semi-join
A semi-join has two parts. The outer query selects from the object you actually want back, and the inner subquery, wrapped in parentheses after IN, queries a second object to produce a list of IDs. The platform runs the subquery first, collects those IDs, then returns outer rows whose left-side field appears in that list. You get the parents, not the children, which is the whole point of the pattern. The left side of the IN clause must be a single ID (primary key) or reference (foreign key) field. So Id IN (...) on Account works, and AccountId IN (...) on Contact works. The subquery on the right selects exactly one field, and that field is usually a foreign key that points back at the outer object. In the account-and-opportunity example, Opportunity.AccountId is the bridge that ties each child back to its parent. Get either side wrong and the query will not save. Think of it as asking a yes-or-no question about related data. Does this account have a closed-lost opportunity? The subquery answers yes or no for every account, and only the yes rows survive.
Semi-join versus the child relationship subquery
SOQL has two very different subquery shapes, and mixing them up is a frequent source of confusion. A semi-join puts the subquery in the WHERE clause and uses it only to filter the outer object. The data you get back is parent records, and the child object never appears in your results. The subquery is a filter, nothing more. A child relationship query, by contrast, puts the subquery in the SELECT clause, like SELECT Name, (SELECT Amount FROM Opportunities) FROM Account. That shape returns each account along with a nested list of its opportunities. You are pulling the children into the response, not filtering by them. Choose based on what you need back. If you only want the accounts and the opportunities are just a condition, use a semi-join in WHERE. If you need to read or display the opportunity fields alongside each account, use the child relationship subquery in SELECT. Many real queries combine both: a semi-join narrows which accounts come back, and a relationship subquery brings along the child rows you intend to show. Picking the right shape keeps both the result structure and the governor cost where you expect them.
Anti-joins and the NOT IN twin
Swap IN for NOT IN and a semi-join becomes an anti-join. Instead of returning parents that have a matching child, it returns parents that have none. SELECT Id, Name FROM Account WHERE Id NOT IN (SELECT AccountId FROM Opportunity WHERE StageName = 'Closed Won') finds accounts with no closed-won deal. This is how you locate gaps: contacts without cases, accounts without active contracts, products never ordered. Anti-joins carry one trap that semi-joins do not. If the inner subquery returns any null value in its selected field, NOT IN can behave in ways that surprise you, because comparing anything to null is not true. In practice the selected field in these subqueries is a foreign key, and rows where that key is null are simply not produced by the subquery, so the issue rarely bites. Still, it pays to know that null handling differs between the IN and NOT IN forms. One more rule the platform enforces: you cannot use the not-equals operator with these joins. Writing != would flip a semi-join into an anti-join or the reverse, so SOQL blocks it. Express the intent directly with IN or NOT IN instead.
The rules that keep a semi-join valid
Salesforce limits these joins so the query engine stays efficient. You can use at most two semi-join or anti-join subqueries in a single query, and no more than two IN or NOT IN statements in one WHERE clause. You cannot nest one of these joins inside another, and you cannot place a semi-join or anti-join in a subquery WHERE clause. They live only in the main WHERE. The strictest rule concerns the left operand. It must be a plain ID or reference field on the outer object, and it cannot traverse a relationship. That is why SELECT Id FROM Contact WHERE Account.Id IN (...) fails with the error Cannot use a foreign key for outer field for semi join. The dot notation walks a relationship, which is not allowed here. Rewrite it as WHERE AccountId IN (...) using the direct foreign key on Contact, and it works. The selected field inside the subquery is allowed to be a reference field, which gives you flexibility on the inner side. These boundaries feel picky at first, but they are predictable. Once you internalize single-field-in, single-ID-or-reference-out, no relationship on the left, most semi-join errors stop happening.
Why semi-joins beat joining in Apex
Without semi-joins, filtering parents by child criteria takes two queries and some glue code. First you query the children to collect their parent IDs, store those IDs in a Set, then run a second query like WHERE Id IN :accountIds against the parent. That works, and the bind-variable version is perfectly valid Apex. But it costs you two SOQL statements against the governor limit instead of one, plus the heap and CPU to build the Set. A semi-join folds both steps into a single statement that the database resolves internally. One SOQL query, no intermediate collection, less code to maintain. On a synchronous transaction capped at 100 SOQL queries, halving the count on a hot path matters, especially inside loops or triggers processing batches of records. There is a judgment call, though. A static list of IDs you already hold in memory is better passed as a bind variable, since the platform can use those IDs directly. A semi-join shines when the filtering set is itself defined by a query, when it would be wasteful to materialize every child ID in Apex just to feed the parent query. Reach for the semi-join when the relationship is the filter, and reach for the bind variable when you already have the concrete IDs.
Selectivity, indexes, and large data volumes
A semi-join is only as fast as the conditions inside it. The query optimizer tries to drive the join from the side that returns the fewest rows, and it leans on indexed fields to do that. If the subquery filters on an indexed column, such as a foreign key, an external ID, or a standard indexed field, the optimizer can resolve the match quickly even on millions of rows. If it filters on an unindexed field with low selectivity, the join can fall back to a full scan and time out. For large data volumes this is the difference between a query that returns in milliseconds and one that throws a non-selective query error. Use the Query Plan tool in the Developer Console to see how the optimizer scores your semi-join and whether it uses an index. Add custom indexes or restructure the filter when the plan looks expensive. Selectivity also interacts with sharing. On objects where the running user can see only a subset of rows, the optimizer must account for sharing, which can change which index it picks. Test semi-joins against production-scale data, not a sandbox with a few hundred records, because the plan that looks fine small can degrade badly at scale.
Trust & references
Cross-checked against the following references.
Straight from the source - Salesforce's reference material on Semi-Join.
Hands-on resources to go deeper on Semi-Join.
About the Author
Dipojjal Chakrabarti is a B2C Solution Architect with 29 Salesforce certifications and over 13 years in the Salesforce ecosystem. He runs salesforcedictionary.com to help admins, developers, architects, and cert/interview candidates sharpen their fundamentals. More about Dipojjal.
Test your knowledge
Q1. What is a Semi-Join in SOQL?
Q2. Why use semi-joins?
Q3. What do they replace?
Discussion
Loading discussion…