The fundamental law nobody talks about
Most organisations approach data quality as a binary problem: either the data is "clean" or it isn't. Twenty-five years of working inside data programmes — from NHS patient records to customer identity systems processing millions of rows — has taught me something different. Data quality is an economic problem, and it follows the same diminishing returns curve as any other investment.
The first pass of data cleansing is transformative. Deduplication, standardisation, fixing null values and obvious errors — this work typically lifts data quality from somewhere around 60–70% to 85–90%. It's systematic, largely automatable, and the return on investment is extraordinary.
The second pass is harder. You're fixing edge cases, resolving ambiguities, reconciling conflicting sources. The quality needle moves from 90% to maybe 95%. Good work, but the effort per percentage point has roughly doubled.
The third pass? You're now in the territory of diminishing returns. Moving from 95% to 98% requires domain expertise, manual review, cross-system validation, and often difficult conversations with data owners about conflicting definitions of truth. The cost per percentage point has increased by an order of magnitude.
The crossover point: marginal value vs marginal cost
Quality is a means to an end. The question is not “how clean can we make the data?” but “at what point does another £1 of cleaning stop improving outcomes?”
- Early: defects drop fast, manual reconciliation shrinks, teams move quicker.
- Middle: improvements are real but slower; you start paying for edge cases.
- Late: cost accelerates while business impact flattens. That’s where perfection turns into waste.
A practical stop rule: track one or two KPIs tied to the decision (cycle time, complaint rate, write-offs, audit exceptions). When marginal quality work stops shifting those metrics, redeploy the budget.
Three forces in tension
Understanding the economics of data quality means holding three competing forces in view simultaneously. Each pulls in a different direction, and the tension between them defines where sensible investment ends and diminishing returns begin.
What poor data quality costs in the real world
Data quality failures rarely show up as “data issues” on a P&L. They show up as leakage, friction, and avoidable risk.
- Revenue leakage: mis-billing, missed renewals, incorrect pricing, failed collections.
- Wasted spend: duplicate customers, wrong targeting, returned mail, failed deliveries.
- Operational drag: manual reconciliations, rework, delayed decisions, slower launches.
- Customer harm: incorrect communications, repeated support contacts, broken experiences.
- Risk and compliance: audit exceptions, reporting errors, remediation programmes.
- AI-specific costs: bad automated decisions, downstream rework, model distrust, rollbacks.
The goal is not abstract perfection. It’s reducing the specific, measurable costs your organisation is already paying.
Who should care (and why)
Different leaders experience “bad data” differently. Translating quality work into their language is how programmes get funded and sustained.
The critical insight is that these forces don't operate independently. The cost curve accelerates precisely as the quality curve flattens. Meanwhile, the business value threshold sits somewhere in between — high enough that inadequate data causes real harm, but below the point where further investment stops generating meaningful returns.
Where does your organisation sit?
Every data programme falls into one of three zones. Most organisations don't know which one they're in — and that lack of awareness is itself a form of waste.
"The goal of data quality is not perfection. It is fitness for purpose — data that is accurate enough, timely enough, and governed enough to support the decisions that matter."
— The Data Quality Craftsman's PrincipleUnder-investment is dangerous but visible — reports don't reconcile, dashboards tell conflicting stories, regulators ask questions. Over-investment is more insidious because it feels like diligence. Teams spend months perfecting datasets that only needed to be directionally correct, or apply enterprise-grade governance to data that serves a single quarterly report.
The value zone is different for every data domain within the same organisation. Financial reporting data needs higher accuracy than a marketing segmentation model. Patient safety records demand near-perfection. A product recommendation engine can tolerate fuzziness that a regulatory submission cannot.
Data quality is a discipline, not a project
Here's the part that catches organisations off guard: data quality degrades naturally over time. Customer records go stale. Business rules evolve. Source systems change their schemas. Staff turnover means institutional knowledge about data definitions walks out the door. A dataset that was 95% accurate six months ago may have drifted to 85% without anyone intervening.
This means the diminishing returns curve isn't something you climb once and stand atop. You're constantly being pushed back down it. The smart approach treats data quality as an ongoing operational discipline — a series of targeted, iterative passes across different data domains, allocating effort where the gradient is steepest and the business impact is highest.
Think of it as portfolio management. You're managing a portfolio of diminishing returns curves — one for customer data, one for financial data, one for operational metrics, one for regulatory submissions. Each has a different shape, a different threshold, and a different rate of degradation. The craft is knowing which curve to invest in today.
A practical framework for data quality investment
After two decades of navigating these trade-offs across industries — healthcare, travel, media, finance — the approach that consistently delivers results follows a structured cycle. Not a one-off audit, but a repeatable discipline.
- Pick 1–2 high-value decisions (pricing, collections, fraud, clinical safety, regulatory reporting).
- Identify the top 3 failure modes caused by bad data (false approvals, missed flags, wrong customer identity).
- Estimate cost per error and frequency (range is fine): cost per error × volume = monthly exposure.
- Map the minimum controls required (accuracy, timeliness, lineage, auditability) and the “good enough” threshold.
- Fund the work with the best marginal return first, then re-check the KPI movement monthly.
| Decision | Failure mode | How to size impact | Minimum controls |
|---|---|---|---|
| Approve credit limit changes | False approvals (losses) / false declines (lost revenue) | Estimate: (cost per error × monthly volume) ± risk buffer | Accuracy + timeliness + lineage + auditability |
Where does your data sit on the curve?
In 30 minutes we can pinpoint which data domains are under-invested, which are over-invested, and where targeted effort will deliver the highest ROI and risk reduction.
- Ranked list of data domains by ROI and risk exposure
- Recommended “good enough” threshold per domain (based on decisions and error cost)
- A 90‑day plan: owners, controls, monitoring cadence, and quick wins