Experiment: Copilot Smart Recap V2 — Suggested Action Items

Progression: Suggested Action Items — V2 Smart Recap with PPS & CIQ

Suggest follow-up action items at the end of every Copilot chat session (Drafting Agent for paid Copilot users, vanilla BizChat for non-Copilot users). Measures uptake on suggested actions and downstream effect on retention and upsell across Consumer + Commercial.

› ✨

Generative Insights

Based on aggregated product experience

AI-generated content may be incorrect

Recommendation Iterate

Copilot recommends: Iterate. The variant moves engagement and upsell positively but one stat-sig retention regression (Consumer 7-day) and weak Consumer engagement suggest more iteration before a full ship. Advisory only — the human owner makes the call.

▲ Pros

Engagement (Commercial) +3.1% — stat-sig improvement on suggested-action acceptance rate.
Upsell (Consumer) Copilot Pro upgrade flow conversion +1.8%, stat-sig.
No regression on the primary North Star metric (chat satisfaction) in either segment.

► Cons

Engagement (Consumer) flat — variant did not move the consumer needle as anticipated.
Daily session count slightly negative (−0.6%, not stat-sig) on Commercial.
Ship-readiness scorecard: 9 of 12 guardrails passing — below the 11/12 historical bar.

▼ Risks

7-day retention (Consumer) regression −0.42% is stat-sig and triggers a Fail guardrail — Ship gate engaged.
Sample size on Consumer Pro segment is below recommended power (~62%).
Holiday seasonality may be inflating Commercial engagement gain.

R-07 — Computed from Expedite gold/silver-layer data only. No external retrieval.

⊘No scorecard executed for this slice.

No Generative Insights to display. Pick a Ring outside SDF/MSIT and an Iteration not in {1, 3, 5, 7} to view results.

💡 Decision

Based on aggregated product experience

Owner decision

✓ Ship

✓ Iterate

✓ Stop

✓ Hold

✓

Iterate is the current owner-proposed decision. The Copilot recommendation also suggests Iterate. Ship is available as an override for legal / brand / business reasons — see warning below.

⛔ Ship is not recommended. One or more guardrail metrics are failing for this slice. You may still ship for legal, brand, or business reasons — document the override below for the audit trail.

› Decision Rationale * required

Required only when your decision differs from Copilot's recommendation (R-04).

⚠ Guardrail Override — required when shipping despite a failing guardrail

Failing guardrail: 7-day retention (Consumer) · −0.42% · BH-significant at q = 0.012 (FDR ≤ 0.05, family m = 19, R-21). Document the override below — all three fields are required for audit.

Why is this acceptable? *

Mitigation *

Followups *

3 of 3 override fields required · 0 complete

Decision Rationale required to save.

Supporting Evidence

🔗 Scorecard ↗ (Expedite · 12 metrics · refreshed 6h ago)
🔗 Experiment Portal ↗ (access may be restricted)
🔗 PM design doc ↗ (Smart Recap V2 — Suggested Actions)
🔗 Telemetry dashboard ↗ (Geneva · Engagement & Retention)
📋 Pre-launch eval set — Not linked

⊘No scorecard for this slice — no ship decision can be made.

The Ring/Iteration combination you selected was not flighted, so no scorecard, guardrail metrics, or Copilot recommendation exists. Decision controls are disabled until you pick a slice with executed data.

Stat-sig regression Stat-sig improvement Flat / non-stat-sig Awaiting power σ Significance: Benjamini-Hochberg FDR ≤ 0.05 (R-21)

⊘No guardrail metrics for this slice.

No scorecard has been executed for the selected Ring × Iteration combination. Guardrail evaluation requires a scorecard run; pick a slice with executed data to view Consumer and Commercial guardrails.

📊 Consumer Guardrails

R-05 · R-08 · R-10

▼ Retention

7-day retention −0.42% 0.30%

28-day retention +0.18% 0.15%

▸ Show all (2 more)

14-day retention 0% 0.25%

Active days per user (30d) 0% 0.50%

▲ Upsell

Copilot Pro upgrade conversion +1.82% 0.80%

Trial start rate +0.74% 0.50%

⚡ Engagement

Suggested action acceptance rate +2.14% 1.00%

▸ Show all (4 more)

Sessions per user / day 0% 1.20%

Messages per session 0% 0.90%

Time-to-first-action 0% 1.50%

Action item completion rate 0% 1.00%

📊 Commercial Guardrails

R-05 · R-08 · R-10

▼ Retention

7-day retention +0.21% 0.18%

28-day retention +0.34% 0.25%

▸ Show all (3 more)

14-day retention 0% 0.30%

Tenant active rate 0% 0.40%

Active days per user (30d) 0% 0.55%

▲ Upsell

Not applicable — Upsell metrics are not tracked for Commercial tenants in this experiment.

⚡ Engagement

Suggested action acceptance rate +3.06% 1.10%

Daily session count 0% 1.30%

▸ Show all (3 more)

Messages per session 0% 0.95%

Time-to-first-action 0% 1.60%

Action item completion rate 0% 1.10%

● Active

📋 Experiment Details

R-01

👥 Owner

Priya Subramanian

🧑‍🤝‍🧑 Users

1,250,402

📅 Started

4/27/2026

🏢 Organization

M365 Copilot Pro

Group: m365–copilot–consumer Ring: WW Stage: WW Iteration: 10 App: Copilot App Initiative: Smart Recap Lifecycle: Engagement Scorecard Length: 14 days Treatment: 50% Control: 50% Audience: Consumer + Commercial DS Partner: Marcus Hill Last Refresh: 6 hours ago

🕒 Decision History

R-11 · R-20 · preview

14 days ago

Priya Subramanian — started experiment at 50/50 allocation.
3 days ago

Marcus Hill (DS) — flagged 7-day retention regression in Consumer cohort.
today

Copilot — generated recommendation: Iterate.
30 days ago
2026-04-13T16:42:08Z

Override David Lydston — shipped V1 progression against Copilot verdict.

Actor:David Lydston <dalyds@microsoft.com> Timestamp (UTC):2026-04-13T16:42:08Z Original verdict:Iterate (Copilot recommendation at override time) Override decision:Ship Why acceptable:Build keynote demo dependency — narrow 5% audience, two-week sunset, no upmarket exposure. Mitigation:Daily metric review by DS partner; auto-rollback rule on any >1% retention regression. Followups:Bug 8423218 — rollback playbook; Workitem 8423219 — re-run with corrected sample size.

⊘No decision history for this slice.

Decisions are recorded per scorecard. Since no scorecard exists for this Ring × Iteration, there are no historical decisions or overrides to show.