Retention Program Audit

Loopify's retention plays were running.
But were they working?

2,000 B2B SaaS accounts Avg ARR / account: $18,000

Based on standard reporting, Loopify was about to defund its most effective retention play. CSM Outreach appeared to hurt retention (−3.4pp). After accounting for how accounts were selected into the play, it's actually the highest-impact intervention in the portfolio (+8.8pp churn reduction).

Standard reporting Causal measurement
CSM Outreach: standard −3.4pp, causal +8.8pp. Onboarding Email: standard +2.9pp, causal +13.2pp. In-App Nudge: standard +5.9pp, causal +5.1pp.

Error bars show 95% confidence intervals. Plays targeting higher-risk accounts show the largest gap between standard and causal estimates.

Program-level breakdown
Under-credited
CSM Outreach
Accounts flagged for outreach 336
Standard reporting −3.4pp
Causal measurement +8.8pp
95% CI [+2.1%, +15.1%]
Naive ARR retained −$203K
Causal ARR retained +$530K
Reallocation opportunity +$733K
Under-credited
Onboarding Email
Accounts enrolled 489
Standard reporting +2.9pp
Causal measurement +13.2pp
95% CI [+7.6%, +19.0%]
Naive ARR retained +$259K
Causal ARR retained +$1.16M
Reallocation opportunity +$899K
Over-credited
In-App Nudge
Accounts triggered 605
Standard reporting +5.9pp
Causal measurement +5.1pp
95% CI [+0.8%, +9.3%]
Naive ARR retained +$639K
Causal ARR retained +$558K
Over-credited by −$81K
Total misattributed program credit — naive vs. causal $1,551,160
Overall churn rate
17.2%
across 2,000 accounts
ARR retained (standard reporting)
$695K
based on standard reporting
Causal ARR retained
$2.25M
after correcting for how accounts enter each play
Misattributed program credit
$1.55M
in misattributed program credit

Why standard reporting gets this wrong

CSM Outreach is deployed to the highest-risk accounts, the ones most likely to churn regardless of whether the play touches them. Comparing churn rates between accounts that received the play and those that didn't ignores that difference entirely. The charts below make the problem visible.

Before matching
Accounts flagged for outreach had significantly higher churn risk. A direct comparison would be misleading.
Before matching: treated accounts show higher propensity scores than controls, indicating selection bias.
After matching
Matched accounts now share a similar risk profile. The comparison is valid.
After matching: treated and matched control accounts show nearly identical propensity score distributions.

Methodology

Propensity Score Matching (PSM)

We estimate each account's probability of being enrolled in each retention play using a logistic regression on observable covariates: account age, ARR tier, product usage score, number of seats, and industry vertical. Accounts in each play are then matched 1:1 to similar accounts that did not receive the intervention. The Average Treatment Effect on the Treated (ATT) is estimated from the matched sample.

Why this matters for retention measurement

Standard before/after comparisons assume accounts end up in a play randomly. In practice, CS teams prioritize high-risk accounts, creating a systematic bias that makes effective plays look weak or even harmful. Causal inference methods correct for this. The result is an estimate of what retention would have been had similarly-at-risk accounts not received the intervention, a genuine counterfactual.

See this analysis run on your retention plays.

We run the same analysis on your retention plays and sequences. You find out what's actually working, what's being misread, and where to put your time and money instead.

Book a 30-Minute Review