Retention Program Audit

Loopify's retention plays were running.
But were they working?

2,000 B2B SaaS accounts Avg ARR / account: $18,000

Based on standard reporting, Loopify was about to defund its most effective retention play. CSM Outreach appeared to hurt retention (−3.4pp). After accounting for how accounts were selected into the play, it's actually the highest-impact intervention in the portfolio (+8.8pp churn reduction).

Standard reporting vs. causal measurement — churn reduction by play

Standard reporting Causal measurement

Error bars show 95% confidence intervals. Plays targeting higher-risk accounts show the largest gap between standard and causal estimates.

Program-level breakdown

Under-credited

CSM Outreach

Accounts flagged for outreach 336

Standard reporting −3.4pp

Causal measurement +8.8pp

95% CI [+2.1%, +15.1%]

Naive ARR retained −$203K

Causal ARR retained +$530K

Reallocation opportunity +$733K

Under-credited

Onboarding Email

Accounts enrolled 489

Standard reporting +2.9pp

Causal measurement +13.2pp

95% CI [+7.6%, +19.0%]

Naive ARR retained +$259K

Causal ARR retained +$1.16M

Reallocation opportunity +$899K

Over-credited

In-App Nudge

Accounts triggered 605

Standard reporting +5.9pp

Causal measurement +5.1pp

95% CI [+0.8%, +9.3%]

Naive ARR retained +$639K

Causal ARR retained +$558K

Over-credited by −$81K

Total misattributed program credit — naive vs. causal $1,551,160

Overall churn rate

17.2%

across 2,000 accounts

ARR retained (standard reporting)

$695K

based on standard reporting

Causal ARR retained

$2.25M

after correcting for how accounts enter each play

Misattributed program credit

$1.55M

in misattributed program credit

Why standard reporting gets this wrong

CSM Outreach is deployed to the highest-risk accounts, the ones most likely to churn regardless of whether the play touches them. Comparing churn rates between accounts that received the play and those that didn't ignores that difference entirely. The charts below make the problem visible.

Before matching

Accounts flagged for outreach had significantly higher churn risk. A direct comparison would be misleading.

After matching

Matched accounts now share a similar risk profile. The comparison is valid.

Methodology

Propensity Score Matching (PSM)

We estimate each account's probability of being enrolled in each retention play using a logistic regression on observable covariates: account age, ARR tier, product usage score, number of seats, and industry vertical. Accounts in each play are then matched 1:1 to similar accounts that did not receive the intervention. The Average Treatment Effect on the Treated (ATT) is estimated from the matched sample.

Why this matters for retention measurement

Standard before/after comparisons assume accounts end up in a play randomly. In practice, CS teams prioritize high-risk accounts, creating a systematic bias that makes effective plays look weak or even harmful. Causal inference methods correct for this. The result is an estimate of what retention would have been had similarly-at-risk accounts not received the intervention, a genuine counterfactual.

Loopify's retention plays were running.But were they working?

Why standard reporting gets this wrong

Methodology

Propensity Score Matching (PSM)

Why this matters for retention measurement

See this analysis run on your retention plays.

Loopify's retention plays were running.
But were they working?