Insight Article

The Economics of Data Quality

Every business sits somewhere on the diminishing returns curve. The ones that waste money either under-invest — making decisions on bad data — or over-invest, chasing perfection long past the point of meaningful return. The job is knowing where to stop.

Executive takeaways

The fundamental law nobody talks about

Most organisations approach data quality as a binary problem: either the data is "clean" or it isn't. Twenty-five years of working inside data programmes — from NHS patient records to customer identity systems processing millions of rows — has taught me something different. Data quality is an economic problem, and it follows the same diminishing returns curve as any other investment.

The first pass of data cleansing is transformative. Deduplication, standardisation, fixing null values and obvious errors — this work typically lifts data quality from somewhere around 60–70% to 85–90%. It's systematic, largely automatable, and the return on investment is extraordinary.

The second pass is harder. You're fixing edge cases, resolving ambiguities, reconciling conflicting sources. The quality needle moves from 90% to maybe 95%. Good work, but the effort per percentage point has roughly doubled.

The third pass? You're now in the territory of diminishing returns. Moving from 95% to 98% requires domain expertise, manual review, cross-system validation, and often difficult conversations with data owners about conflicting definitions of truth. The cost per percentage point has increased by an order of magnitude.

Figure 1
The diminishing returns of data quality investment
0% 25% 50% 75% 100% Start Pass 1 Pass 2 Pass 3 Pass 4+ Investment Effort → Sweet Spot Zone Cost exceeds quality gains Quality Cost

The crossover point: marginal value vs marginal cost

Quality is a means to an end. The question is not “how clean can we make the data?” but “at what point does another £1 of cleaning stop improving outcomes?”

  • Early: defects drop fast, manual reconciliation shrinks, teams move quicker.
  • Middle: improvements are real but slower; you start paying for edge cases.
  • Late: cost accelerates while business impact flattens. That’s where perfection turns into waste.

A practical stop rule: track one or two KPIs tied to the decision (cycle time, complaint rate, write-offs, audit exceptions). When marginal quality work stops shifting those metrics, redeploy the budget.

Data Quality Level
Cost per % Point Improvement

Three forces in tension

Understanding the economics of data quality means holding three competing forces in view simultaneously. Each pulls in a different direction, and the tension between them defines where sensible investment ends and diminishing returns begin.

↗ Quality
The Improvement Curve
Steep early gains flatten rapidly. The first 25% of effort delivers 80% of the quality improvement.
↗ Cost
The Cost Escalation
Cost per unit of improvement accelerates. Late-stage fixes require domain experts, not automation.
→ Value
Business Threshold
A "good enough" threshold exists where data becomes trustworthy for decision-making. It varies by use case.
⟳ Time
The Iteration Cycle
Data quality degrades over time. Without ongoing governance, yesterday's clean data becomes tomorrow's problem.

What poor data quality costs in the real world

Data quality failures rarely show up as “data issues” on a P&L. They show up as leakage, friction, and avoidable risk.

The goal is not abstract perfection. It’s reducing the specific, measurable costs your organisation is already paying.

Who should care (and why)

Different leaders experience “bad data” differently. Translating quality work into their language is how programmes get funded and sustained.

CFO
Write-offs, leakage, audit exceptions, working capital visibility.
COO
Cycle time, rework, operational friction, incident and escalation rates.
CMO
Wasted spend, attribution reliability, suppression and segmentation accuracy.
CRO / Compliance
Traceability, controls, regulatory reporting confidence, remediation risk.
CTO
System coupling, data pipeline reliability, scalability, change failure rate.

The critical insight is that these forces don't operate independently. The cost curve accelerates precisely as the quality curve flattens. Meanwhile, the business value threshold sits somewhere in between — high enough that inadequate data causes real harm, but below the point where further investment stops generating meaningful returns.

Where does your organisation sit?

Every data programme falls into one of three zones. Most organisations don't know which one they're in — and that lack of awareness is itself a form of waste.

Under-invested
Value Zone
Over-invested
Decisions made on unreliable data. Risk of flawed strategy and compliance failures. Data is trustworthy for its intended purpose. Investment matched to business need. Chasing perfection past the point of return. Budget consumed, marginal gains.

"The goal of data quality is not perfection. It is fitness for purpose — data that is accurate enough, timely enough, and governed enough to support the decisions that matter."

— The Data Quality Craftsman's Principle

Under-investment is dangerous but visible — reports don't reconcile, dashboards tell conflicting stories, regulators ask questions. Over-investment is more insidious because it feels like diligence. Teams spend months perfecting datasets that only needed to be directionally correct, or apply enterprise-grade governance to data that serves a single quarterly report.

The value zone is different for every data domain within the same organisation. Financial reporting data needs higher accuracy than a marketing segmentation model. Patient safety records demand near-perfection. A product recommendation engine can tolerate fuzziness that a regulatory submission cannot.

The AI Factor
AI doesn't just use your data. It amplifies whatever quality exists in it.
Poor data quality in traditional reporting produces visible errors — a wrong number in a spreadsheet, a mismatch in a dashboard. Poor data quality feeding an AI model produces something worse: confidently wrong outputs that look plausible, pass surface-level scrutiny, and quietly corrupt downstream decisions. The quality threshold shifts higher when AI enters the picture, because the consequences of getting it wrong become harder to detect and more expensive to unwind.
Figure 2
How AI shifts the data quality threshold
Traditional Analytics & Reporting ~90% Quality Sufficient for reliable reporting 60% ← margin for error is visible AI-Driven Decision Systems ~96%+ Quality Required for trustworthy AI outputs 60% ← errors are hidden Threshold shifts → +6% = significantly more effort
Traditional threshold
AI-ready threshold
Additional investment required

Data quality is a discipline, not a project

Here's the part that catches organisations off guard: data quality degrades naturally over time. Customer records go stale. Business rules evolve. Source systems change their schemas. Staff turnover means institutional knowledge about data definitions walks out the door. A dataset that was 95% accurate six months ago may have drifted to 85% without anyone intervening.

This means the diminishing returns curve isn't something you climb once and stand atop. You're constantly being pushed back down it. The smart approach treats data quality as an ongoing operational discipline — a series of targeted, iterative passes across different data domains, allocating effort where the gradient is steepest and the business impact is highest.

Think of it as portfolio management. You're managing a portfolio of diminishing returns curves — one for customer data, one for financial data, one for operational metrics, one for regulatory submissions. Each has a different shape, a different threshold, and a different rate of degradation. The craft is knowing which curve to invest in today.

Figure 3
The reality: quality degrades between iterations
Low Good High Time → Business Threshold AI Threshold ↘ natural decay ↘ decay ↘ decay Iteration 1 Iteration 2 Iteration 3 Iteration 4
Data quality over time (with iterative improvement)
Natural quality degradation

A practical framework for data quality investment

After two decades of navigating these trade-offs across industries — healthcare, travel, media, finance — the approach that consistently delivers results follows a structured cycle. Not a one-off audit, but a repeatable discipline.

Assess where you are on the curve
Profile each data domain independently. A blanket quality score across the organisation is meaningless — you need to know where customer data sits versus financial data versus operational metrics.
Define the threshold that matters
Work backwards from the decision the data supports. Regulatory reporting may demand 99%+. A marketing segmentation model may be perfectly effective at 92%. Let the business outcome define the target.
Invest where the gradient is steepest
Allocate effort to the data domains where a unit of investment still delivers meaningful quality improvement. If you're at 92% and the threshold is 90%, that budget is better spent elsewhere.
Build governance that prevents decay
Automated monitoring, validation rules, ownership frameworks, and regular profiling cadences. The cheapest quality improvement is preventing degradation in the first place.
Re-assess when AI enters the picture
If data is feeding AI models, revisit every threshold. The "good enough" line has moved, and the cost of getting it wrong has multiplied.
Build a business case in 30 minutes
If you can’t explain the spend in terms of decisions, outcomes, and risk appetite, it won’t survive contact with budgets. This is the fastest way to get to a defendable number.
  1. Pick 1–2 high-value decisions (pricing, collections, fraud, clinical safety, regulatory reporting).
  2. Identify the top 3 failure modes caused by bad data (false approvals, missed flags, wrong customer identity).
  3. Estimate cost per error and frequency (range is fine): cost per error × volume = monthly exposure.
  4. Map the minimum controls required (accuracy, timeliness, lineage, auditability) and the “good enough” threshold.
  5. Fund the work with the best marginal return first, then re-check the KPI movement monthly.
Example decision map
Decision Failure mode How to size impact Minimum controls
Approve credit limit changes False approvals (losses) / false declines (lost revenue) Estimate: (cost per error × monthly volume) ± risk buffer Accuracy + timeliness + lineage + auditability

Where does your data sit on the curve?

In 30 minutes we can pinpoint which data domains are under-invested, which are over-invested, and where targeted effort will deliver the highest ROI and risk reduction.

Book a Discovery Call