UnflakeOpsUnflakeOps
Read-only access • PR-based changes • You own everything
Guarantee

Cut flaky CI failures by 50%+ in 30 days — or we continue Core at no additional core fee (cap 60 days).

Measurable, unambiguous, and transparent — here's exactly how we define FFR, baseline, success, prerequisites, and exclusions.

Who this is for

  • B2B SaaS teams with 15–150 engineers
  • GitHub Actions or GitLab CI as primary CI
  • ≥ 500 pipelines in the last 28 days (or equivalent)

Outcome we guarantee

A ≥ 50% reduction in your flaky‑failure rate (FFR) by Day 30 — otherwise we continue our Core service at no additional core fee for up to 60 days until we do.

Definitions

Flaky Test

A test/job that can pass and fail with no code or environment change between runs.

Flaky Failure Event (FFE)

A pipeline failure attributable to flakiness or unstable CI, confirmed by rerun/pass or known fingerprints.

Flaky Failure Rate (FFR)

FFR = FFEs ÷ total pipelines

Computed daily and aggregated over the window.

Baseline Window

Last 28 days prior to Day 1, or the most recent 500 pipelines, whichever is longer.

Success criteria

Prerequisites

  • Least‑privilege access (read + PRs for changes)
  • PR reviews within 2 business days
  • No planned CI migrations/outages during sprint
  • 1–2 hours/week from your engineering contact(s)

Exclusions

  • Legitimate regressions introduced by code changes
  • Provider/infra incidents (CI, cloud, quota, region)
  • Major framework rewrites started during sprint
  • Org policies blocking gates/quarantines rollout

What we deliver

Week‑1

Baseline & Readiness Index; PASS/WARN/FAIL gate on PRs; Top‑5 fixes prepped; telemetry online.

Week‑2

Fingerprints; quarantines; rules for noisy suites/jobs; coaching.

Week‑3/4

Fixes & adoption; enforce; SOPs; 30‑/90‑day plan; handover.

You keep everything — rules, scripts, dashboards, SOPs — in your repos.

How we measure

Formula
FFR = FFEs ÷ total pipelines
  • FFEs: confirmed by rerun/pass or fingerprint
  • Baseline = last 28 days or 500 pipelines (whichever is longer)
  • Success = any 7‑day window ≤ 50% baseline, or Day1–30 average ≤ 50%

Process (at a glance)

DevPull RequestGate ScorePASS · WARN · FAILMergeQuarantine/CoachWeek‑1Baseline + GatesWeek‑2Fingerprints + QuarantineWeek‑3Fixes + AdoptionWeek‑4Enforce + Handover