Returns and refunds aren't the same thing, and your fraud model needs to know
There is a small but important distinction at the data layer of every Shopify store that runs a returns app, and most fraud tools get it wrong by accident. The distinction is between a return and a refund.
A return is a structured record that says "the customer told us they want to send something back." On Shopify, that record exists when one of two things happens: the customer used the native Shopify Returns API (rare, because most stores route returns through a third-party app), or the merchant manually opened a return record from the order admin (also rare on stores with any volume).
A refund is a structured record that says "money moved from the merchant to the customer." On Shopify, refund records are created by a wider set of paths: a return that closes with a payout, a manual refund issued by the merchant from the orders page, an automatic refund triggered by a third-party app like Loop or Returnly, a chargeback the merchant decided not to fight, a goodwill credit issued via Shopify Flow, a partial refund issued during a customer-service ticket.
Returns are a subset of refunds. Not the other way around.
This distinction has structural implications for any fraud tool that wants to use return-related data as a label. A fraud tool that watches the Shopify returns/created webhook only ever sees the return records that were created through the native Shopify path. The merchant who installed Loop a year ago and routes 90% of their return flow through Loop's API never produces a single returns/created event for fraud purposes. The merchant's fraud-relevant data is hidden behind a third-party SaaS API that the fraud tool never talks to.
That blindness is the merchant-side problem this post is about.
What the third-party return apps do (and don't)
Loop, Returnly, AfterShip, ReturnGO, and the rest of the third-party return-management space all run their own return-flow UI on top of Shopify. The customer enters the third-party UI, picks the items to return, picks a reason, optionally pays for an exchange or upgrade, and the third-party app handles the merchant-side work. When the return closes, the third-party app calls Shopify's refund API to issue the refund. That refund call hits Shopify's refunds/create webhook. Shopify writes a refund record. Done.
What happens at the structured-data level is that Shopify ends up with a refund record but no return record. The return record lives inside Loop's own database, in Loop's own schema. A fraud tool that wants to know "did this customer return their last three orders?" cannot get that information from Shopify. It has to either talk to Loop's API, or it has to learn to read the refund records and treat them as return-equivalent.
Most fraud tools do neither. They watch returns/created, they see a sparse trickle of events from the small fraction of stores using native Shopify Returns, and they mark everything else as "no returns." Their fraud models train on a label that is wrong for the majority of the merchant population.
The label decision
When we built the fraud-scoring model that this series will get to in a later post (spec 201, refund-propensity ML), the first design decision was which database row would be the label.
The candidates were:
ReturnRecord: the row written when a structured return event fires. Reliable shape, terrible coverage (most merchants never produce these).RefundRecord: the row written when a refund is issued. Reliable shape and good coverage (every refund path eventually hits the Shopify refund webhook, so we see them all).- A composite "did this customer get money back" label derived from joining order, return, and refund tables.
We picked RefundRecord. Reasoning:
The label needs to be the same shape on every merchant. A model trained on ReturnRecord produces wildly different label distributions on a Shopify-Returns-only merchant versus a Loop merchant. A model trained on RefundRecord sees a consistent shape regardless of which return-management app the merchant uses.
The label needs to be monetary. Fraud-prevention math runs on dollars, not on action types. A return that closes without a refund is not a fraud signal in the same way that a return that closes with a $200 payout is. RefundRecord carries the dollar amount; ReturnRecord carries the action.
The label needs to be hard to fake. A merchant who wants to hide return data from a fraud tool can choose not to use Shopify's native returns flow. They cannot choose not to issue refunds. The refund is the irreducible thing.
The label needs to work across return-management migrations. A merchant who switches from Loop to ReturnGO halfway through their year produces an artificial discontinuity in ReturnRecord data. They produce no discontinuity in RefundRecord data because the refund webhook fires the same way regardless of which front-end UI generated the refund. The fraud model trained on refunds is migration-resilient.
Multi-currency
The hardest piece of the refund label is multi-currency. A merchant on Shopify Plus may issue refunds in USD, EUR, GBP, and CAD all in the same week, and the dollar amount on a refund record reflects the customer's transaction currency, not a normalized currency.
Spec 142 made the call to skip-and-log for multi-currency cases. When the refund's currency is not the shop's primary currency, the refund record is persisted but the label aggregation skips it. We log the skip count per shop and surface it as a data-quality flag. The reason for skip rather than convert is that historical exchange-rate conversion is a moving target and a label that depends on a third-party FX feed is a fragile label. The merchant whose refund currency is not their primary currency is a small fraction of the merchant population. We are willing to lose those refunds from the training set rather than introduce currency-conversion noise.
How third-party data still helps
Refusing to use third-party return data as the label does not mean refusing to use it as a feature. The model takes return data as one of many features. If a merchant uses Loop and we have read access to Loop's data via the Shopify storefront, the model gets a feature like "the customer initiated a return through the third-party app within 7 days of the order." That feature is informative. It just is not the label.
Spec 182 added a Shop.returnIntegrationMode field that records which return-management app the merchant uses. Shop-level switching between Loop, Returnly, ReturnGO, AfterShip, and native produces different feature availability. The model knows which features it can rely on for which merchants and ignores features that depend on a third-party app the merchant does not use.
Spec 184 added a refund-as-return fallback for merchants with no return-management app at all. On those merchants, the refund record carries enough structured data to derive an inferred-return event: the refund line items tell us what came back, the refund timestamp tells us when. We treat the inferred return as a feature for fraud-engine purposes. We do not treat it as a label.
Engineer detail. The data-model side of this is straightforward.
RefundRecordis the source-of-truth row; it has acurrency,totalRefunded,createdAt(the Shopify-side timestamp), and a foreign key toOrder.ReturnRecordexists for merchants whose flow produces it and is treated as enrichment data. The join logic inapp/lib/risk/context/buildFromRefund.server.tshandles three cases: refund with matching return record (rare on Loop merchants, common on Shopify-native merchants), refund with no matching return record (common everywhere), refund with multiple matching return records (multi-step return flow on apparel stores; we collapse to the latest matching return). The risk-engine context object carries both fields independently so signal evaluators can read whichever they need.The
Shop.returnIntegrationModefield has valuesnative,loop,returnly,aftership,returngo, andunknown. The default isunknownuntil we detect a specific webhook signature or the merchant tells us during onboarding. The detection is heuristic: a merchant who has never produced areturns/createdevent in 90 days but has produced 50+ refund events isunknown(probably a third-party app, but we cannot identify which); a merchant who produces both is one of the four named integrations. The detection runs as a periodic cron, not on every webhook.
What this means for the merchant
If your fraud tool says "we saw 0 returns from this customer this year" and the customer has refunded eight orders this year through Loop, the tool is telling you the truth as it sees it and lying about reality. The customer-history pane is wrong. The return-rate signal misses every fraud event that used the third-party app. The fraud chargeback that finally hits is "out of nowhere" only if you were looking at the wrong table.
The fix at the merchant level is to ask any fraud tool you evaluate which data source it uses for return labels. Refund-driven? Return-record-driven? Composite? "We watch the returns webhook"? The answer matters and most fraud tools do not advertise it.
Take-away
Returns and refunds are different rows in different tables produced by different APIs. Refunds are the universal monetary truth across return-management apps; returns are not. A fraud model that uses returns as the label is correct for the small slice of merchants on native Shopify Returns and silently broken for everyone else. We picked refunds as the label so the model works the same way on every merchant regardless of which return-management app they use.
If your team builds Loop integrations, ReturnGO integrations, or any other third-party returns app, the data-model angle here is one we would happily co-write a follow-up on. Reach out.
RefundSentry is an intelligence layer for Shopify return fraud. See pricing for plans during the private beta.