
Cut flaky CI failures by 50%+ in 30 days — or we continue Core at no additional core fee (cap 60 days).
Measurable, unambiguous, and transparent — here's exactly how we define FFR, baseline, success, prerequisites, and exclusions.
Who this is for
- B2B SaaS teams with 15–150 engineers
- GitHub Actions or GitLab CI as primary CI
- ≥ 500 pipelines in the last 28 days (or equivalent)
Outcome we guarantee
A ≥ 50% reduction in your flaky‑failure rate (FFR) by Day 30 — otherwise we continue our Core service at no additional core fee for up to 60 days until we do.
Definitions
Flaky Test
A test/job that can pass and fail with no code or environment change between runs.
Flaky Failure Event (FFE)
A pipeline failure attributable to flakiness or unstable CI, confirmed by rerun/pass or known fingerprints.
Flaky Failure Rate (FFR)
Computed daily and aggregated over the window.
Baseline Window
Last 28 days prior to Day 1, or the most recent 500 pipelines, whichever is longer.
Success criteria
- Any 7‑day window within Day 1–30 has FFR ≤ 50% of baseline, or
- The Day 1–30 average FFR ≤ 50% of baseline
- Dashboards + CSV/JSON export provided at handover
Prerequisites
- Least‑privilege access (read + PRs for changes)
- PR reviews within 2 business days
- No planned CI migrations/outages during sprint
- 1–2 hours/week from your engineering contact(s)
Exclusions
- Legitimate regressions introduced by code changes
- Provider/infra incidents (CI, cloud, quota, region)
- Major framework rewrites started during sprint
- Org policies blocking gates/quarantines rollout
What we deliver
Week‑1
Baseline & Readiness Index; PASS/WARN/FAIL gate on PRs; Top‑5 fixes prepped; telemetry online.
Week‑2
Fingerprints; quarantines; rules for noisy suites/jobs; coaching.
Week‑3/4
Fixes & adoption; enforce; SOPs; 30‑/90‑day plan; handover.
You keep everything — rules, scripts, dashboards, SOPs — in your repos.
How we measure
- FFEs: confirmed by rerun/pass or fingerprint
- Baseline = last 28 days or 500 pipelines (whichever is longer)
- Success = any 7‑day window ≤ 50% baseline, or Day1–30 average ≤ 50%