Day one of using a fraud tool shouldn't be a blank dashboard
Pick any fraud-prevention or returns-analytics SaaS app on the Shopify App Store. Install it. Watch the loading spinner. Read the welcome modal. Click through to the dashboard.
What you see, on every one of them, is some variation of: "Your dashboard is empty. Once you receive your first 50 orders we'll start showing insights." Or 100 orders. Or 30 days. Or "wait until your first chargeback comes in."
That is the standard SaaS install moment. It is also the standard SaaS first-week churn moment. The merchant evaluated the app expecting to see signal in their own data. They got a placeholder graphic and a wait-list. By the time the dashboard fills in, the seven-day trial is over.
The decision we made early at RefundSentry was that a merchant's first dashboard view should be their own historical fraud picture, not a placeholder. To get there, we backfill the merchant's last 6 to 24 months of orders, refunds, returns, and chargebacks at install time and run the risk engine over the entire backfilled set before the merchant sees their first screen.
What the backfill actually does
Three things have to happen at install time for a Shopify merchant to walk into a useful dashboard:
The merchant's historical orders need to be pulled from the Shopify Admin API and translated into the same row shape that our forward-from-install pipeline produces. That gets us volume.
The historical orders need to be enriched with the same fraud-relevant context that our forward-from-install pipeline produces: refund records, return records, chargebacks, customer aggregates, address fingerprints. That gets us the inputs the risk engine needs to score.
The risk engine has to run over every backfilled order with the merchant's calibrated threshold set. That gets us the merchant's actual zone distribution: how many of their last 12 months of orders would have scored LOW, MEDIUM, HIGH under today's engine. That distribution is what shows up on the dashboard.
A merchant who sells $300 streetwear at high return rates ends up with a backfill output that looks visibly different from a merchant who sells $30 cosmetics at low return rates. The fraud-zone histogram tells two different stories. Both are usable from day one.
How the backfill stays sane
A naive "fetch every order ever placed" approach hits the Shopify API rate limits within the first hundred orders and fails forever. The backfill needs to be structured so it can run for hours on a high-volume merchant, recover from a failure halfway through, and not block the rest of the install flow.
The pipeline is built around two database concepts: a BackfillPlan row, and a set of BackfillStage rows that belong to it. Each stage has a kind (orders, refunds, returns, chargebacks, baseline-build, aging-inference) and a status (pending, in-progress, completed, failed, blocked-quality-check). Stages within a plan execute in a strictly ordered sequence. Orders before refunds, refunds before chargebacks, chargebacks before the merchant's risk baseline gets computed.
If a stage fails, only that stage retries. The orders stage doesn't re-run when the chargebacks stage hits a transient API error. If the merchant's data quality looks broken (no orders in the window, too few customers to compute a baseline, suspiciously uniform return rates), the plan blocks at the quality-gate stage and an operator reviews it manually before the dashboard goes live. That manual review path is rare, but the alternative (a confidently-wrong dashboard built on bad data) is worse.
For high-volume merchants, the orders and refunds stages use Shopify's Bulk Operations API instead of paged GraphQL. Bulk Operations is asynchronous: we ask Shopify to dump the merchant's last 24 months of orders to a JSON file, Shopify mails us a download URL when it's ready, we stream-parse the file and write rows. A backfill that would have taken six hours via paged GraphQL takes 25 minutes via Bulk Operations.
The post-backfill insights snapshot
When the last stage in the plan completes, the engine kicks off one final job: a post-backfill insights snapshot. That snapshot is the source-of-truth for the welcome dashboard. It contains the zone distribution, the top-firing signals across the merchant's history, the highest-risk customers identified retroactively, the chargeback density per month, the address-cluster count, the return-reason distribution.
That snapshot is what we put in front of the merchant on their first dashboard view. It is computed once, persisted, and read directly by the welcome screen. Every later page-load reads the same snapshot row. This is a deliberate choice: at install time the engine has time to run a heavy aggregation query; at every later page-load we want sub-second response times. The split between "compute once at install" and "read everywhere later" is what keeps the live dashboard fast.
Why the ordering matters
There is a tempting alternative architecture where each stage runs concurrently and the dashboard surfaces partial data as each stage completes. We tried that. It produces dashboards that are subtly inconsistent: the orders count says 2,400, the chargebacks chart says 12 chargebacks against orders that have not yet appeared, the customer-aggregate page says 380 customers because the customer-build stage finished first.
Concurrent stages also break the merchant's risk baseline. The baseline is a per-shop statistical profile (median order value, p90 order value, refund rate, chargeback rate per cohort) used by the risk engine to interpret a single order in context. The baseline cannot be computed correctly until orders, refunds, and chargebacks are all loaded. A concurrent backfill produces a baseline built on partial data, and the risk engine then scores live orders against the wrong baseline. We saw this happen on a real merchant during early development. Every order they took in the first 30 minutes after install scored too high, then the scores readjusted as the baseline finished building. The merchant ended up with a dashboard full of false-positive HIGH zones from their own backfill.
The fix was the strict stage ordering. No baseline computation until upstream stages have completed. No risk scoring against partial baselines.
Engineer detail. The ordering is enforced by
assertStagesInOrderinapp/lib/backfill/ordering.ts. The function takes the planned stage list and checks that for every consecutive pair, the predecessor'skindis allowed to come before the successor'skindper a hardcodedSTAGE_PRECEDESmap. Violations throwBackfillStageOrderingErrorbefore the plan persists. A contract test (tests/contract/backfill/stage-ordering.test.ts) walks every code path that constructs a plan and asserts the assertion fires. The reason the assertion exists rather than just relying on careful coding is that we have several plan-construction sites: install-time, manual-rerun, single-stage-retry, operator-override. Each one is small and reasonable in isolation, and any one of them is a place a future bug could emit stages in the wrong order. The assertion is the single chokepoint that catches drift in any of them.Spec 171 added the first-backfill quality gate at the end of the orders stage. The gate looks at the volume, time distribution, and customer-uniqueness of what came in. If the data looks structurally broken (zero orders, single-customer-shop, identical-timestamps-everywhere), the gate marks the plan
BLOCKED_QUALITY_CHECKand an operator reviews. The operator can override the block with a documented reason viaapplyOperatorOverride, which logs the override on the plan and lets it complete. We did not add a UI for this; the operator path runs from a small admin tool. The frequency is low enough that the manual loop is fine.
What the merchant sees
The first dashboard view shows seven things in order: total backfilled orders, percent in each risk zone, the three most-fired risk signals over the merchant's history, a top-five list of historically-high-risk customers, the merchant's chargeback density per month, the most common return reason, and a recommendation block ("review the 18 HIGH-zone orders flagged in the backfill") that links into the held-orders view.
None of those numbers are placeholders. None of them say "wait." Every one of them comes from the merchant's own data. A merchant who installs at 11 AM has a real dashboard by 11:25 if their volume is normal, by 1 PM if their volume is enormous and the bulk operation takes a while.
Why this is hard for incumbents to copy
Backfilling 24 months of historical Shopify data sounds like a "set it up once and forget it" feature, but it is not. The Shopify Admin API is a moving target. Webhook payload shapes change. The Returns API is not a REST surface, only GraphQL. Bulk Operations have rate limits that are different from the regular GraphQL surface, and those limits change. Test coverage for backfill code has to handle "merchant has 0 orders," "merchant has 4 million orders," and a long tail of "merchant has 200 orders with one of them missing a created_at."
The forward-from-install path is comparatively easy: a single webhook fires, we ACK, we queue, we score, we persist. The backfill path is the engineering moat. It is also the difference between a merchant getting value on day 1 and a merchant churning before day 30.
Take-away
If your fraud or analytics tool installed in five minutes and then asked you to wait, the question is not whether the model is good. The question is whether the team building it ever shipped the boring engineering work that gets a merchant from install to a useful dashboard. That work happens at the data layer, not the model layer. It is the unglamorous prerequisite for everything else this series covers.
RefundSentry is an intelligence layer for Shopify return fraud. See pricing for plans during the private beta.