The BMAD Method: Ship Faster With AI Without Losing Control

December 9, 2025

How to Build a Fault-Tolerant Data Platform: Lessons from Netflix

December 10, 2025

Published by admin on December 9, 2025

We Got Data Quality Wrong Until We Built This Framework

We used to think data quality was something we handle when something breaks.

A dashboard looks off, a stakeholder asks why, we trace the pipeline, patch the issue, and move on. That approach can keep the lights on, but it does not build trust. What changed for us was treating data quality as a repeatable system we run every day, not an occasional cleanup task.

Here is what we got wrong, and the framework that fixed it.

The three things we got wrong

First, we treated quality as cleanup instead of prevention. If the only time we look at quality is after a report looks strange, the best-case outcome is we find it late.

Second, we relied on tools without clear ownership. Monitoring and tests help, but when no one owns a dataset or a KPI definition, issues turn into slow coordination problems.

Third, we tested tables, but not business metrics. Schema checks can pass while the KPI quietly drifts due to joins, mapping changes, or upstream meaning shifts.

Once we accepted these three mistakes, the work stopped being mysterious. We needed a framework, not more ad hoc fixes.

What a data quality framework actually is

A data quality framework is the day-to-day operating system for keeping data trustworthy.

It is not a single tool. It is a structured way to define what “good” means, assign ownership, enforce checks, monitor drift, respond to incidents, and continuously improve.

If data quality management is the umbrella, the framework is how the umbrella works in practice. It turns intent into routines.

In our case, the framework came down to six building blocks:

Standards and definitions
Ownership and decision rights
Automated quality checks
Monitoring and alerts
Incident response
Continuous improvement cadence

The key values we enforce

A framework only works if “quality” is measurable. We made it measurable in three ways.

Standards, definitions, and shared meaning

Before we wrote tests, we wrote definitions.

For critical datasets and KPIs, we document:

What it represents in business terms
The source of truth, including authoritative systems
The transformation logic at a high level
Refresh expectations, including freshness targets
Known exclusions and edge cases

This is not documentation for its own sake. It prevents “same word, different meaning” problems that no amount of testing can fix.

The quality dimensions we track

We kept the dimensions simple and practical. For most teams, these cover the majority of real failures:

Accuracy, does it match reality
Completeness, are required fields present
Consistency, does it agree across systems and time
Timeliness, does it arrive when needed
Validity, does it follow rules and formats
Uniqueness, are duplicates intentional and controlled
Relevance, is the dataset tied to an actual business use

Not every dataset needs an intensive treatment. Relevance helps us decide where to be strict and where to be lightweight.

Data contracts, expectations that stop breakages early

A big step forward was treating important datasets like products with contracts.

A contract can include:

Schema expectations and allowed types
Accepted null behavior
Allowed value ranges or categories
Freshness targets
Acceptable change, for example volume changes within a threshold
Deprecation rules for fields, so changes do not surprise downstream teams

Contracts create stability. They reduce the number of “surprise changes” that show up as broken dashboards later.

The operating model, who does what

One reason data quality stays vague is that responsibility stays vague. We fixed that by making roles explicit, without adding heavy bureaucracy.

Data owner

Accountable for a dataset or KPI. The owner decides what “correct” means, approves definition changes, and prioritizes quality fixes when tradeoffs exist.

Data steward

Keeps the day-to-day quality process running. The steward watches quality signals, triages issues, manages thresholds, and helps maintain documentation.

Platform custodian

Runs the underlying infrastructure. This role ensures pipelines, storage, access controls, and runtime reliability.

Data consumers

Use the data, and report issues. Consumers are part of the feedback loop, not passive recipients.

Most importantly, we defined decision rights. Who can change KPI definitions, who can approve exceptions, and who signs off on a breaking change. That single clarity removes weeks of back-and-forth later.

Where checks live, how we implement it

A framework fails when checks are only applied in one place. We apply checks in three layers.

1) At entry, prevent bad data from entering the system

This is the cheapest place to catch errors.

Examples:

Required fields and format validation
Controlled vocabularies instead of free-text where possible
Basic business rules, for example start date cannot be after end date
Deduplication logic at ingestion when sources are noisy

The goal is not perfection, it is reducing avoidable junk early.

2) In pipelines, keep transformations from quietly degrading data

These checks protect the shape and health of datasets as they move.

Common checks:

Schema tests, type checks, and contract enforcement
Null thresholds for critical fields
Row count and volume anomaly checks
Referential integrity checks for joins
Distribution checks for categories and numeric ranges
Freshness checks for key tables

This layer gives fast signals when upstream or transformations change.

3) At metrics, protect the business truth

This is where we learned the biggest lesson. A pipeline can look healthy while the KPI becomes untrustworthy.

Metric-level checks include:

Reconciliation against source totals for critical KPIs
Trend break detection, sudden step changes or drops
Sanity ranges, values that should never be negative or exceed expected bounds
Cross-system parity checks when KPIs exist in multiple tools
Slice-level health checks, not just overall totals

Table checks keep data stable. Metric checks keep decisions stable.

Monitoring and incident response, how we keep it from drifting

Even great checks fail if no one sees the signals or knows what to do next. Monitoring is part of the framework, not an optional add-on.

The dashboards we actually use

We keep monitoring focused on a few signals that drive action:

Freshness and pipeline health
Test failures by criticality
KPI drift and anomalies
Incidents over time, patterns and recurring sources

If a dashboard is not used, it gets deleted. Noise kills quality programs.

Severity levels that reduce panic

Not every issue deserves the same response. We classify incidents into a small set of severity tiers based on impact.

For example:

High severity, affects customer billing, compliance, or board-level KPIs, often triggers blocking or rollback
Medium severity, affects internal reporting but has workarounds, usually warn plus ticket
Low severity, minor dataset issues, tracked and fixed in cadence

Clear severity rules prevent chaos. Teams know when to stop the line and when to proceed safely.

The workflow

Alert, triage, identify impacted KPIs, fix, verify, then add a guardrail so the same issue is less likely to return.

That last step is the framework doing its job.

How we roll it out without boiling the ocean

The fastest way to fail at data quality is trying to “fix all data”. We roll out the framework in a way that creates wins early.

Start with what decisions rely on

Pick 3 to 5 critical KPIs. Then map the upstream datasets and transformations that power them. That defines your first quality perimeter.

Implement the minimum checks first

For those KPIs and upstream datasets, implement a small set of high-signal checks:

Freshness
Volume anomalies
Null thresholds for critical fields
Referential integrity checks for important joins
One or two metric reconciliations

Assign ownership and SLAs

A check without an owner is just a notification. For the first perimeter, define owners and response expectations.

Expand by domain

After the first perimeter, expand coverage domain by domain, not table by table. Scale what works, remove what creates noise.

The checklist we use for any new dataset or KPI

When a new dataset shows up, we try to keep the process consistent.

Clear definition and business purpose
Named owner and steward
Known lineage and dependencies
Contract, schema expectations and acceptable change
Tests, thresholds, and monitoring signals
Incident severity mapping, what happens if it breaks
Review cadence, how we tune over time

This checklist turns “quality work” into a repeatable launch process.

What changed after we adopted the framework

The results were not magical, they were operational.

We spent less time debating numbers and more time using them.

Debugging became faster because ownership and checks narrowed the search space.

Repeated incidents dropped because learning turned into guardrails.

KPIs became more stable, which made analytics and AI use cases less fragile.

The biggest shift was trust. Once teams trust the data, everything moves faster.

Where Lestar fits

A framework is hard to run when transformations, definitions, and monitoring are scattered across too many systems. It becomes difficult to enforce consistency, trace lineage end to end, and connect quality failures to downstream impact.

Lestar helps by consolidating data work into a single environment, where transformations, governance, and quality checks can be applied consistently, monitored continuously, and traced back to the metrics and teams they affect.

Products

Lestar Finance

Lestar ESG

Services

Artificial Intelligent Solution

Business Intelligent Solution

ESG Reporting Solution

The BMAD Method: Ship Faster With AI Without Losing Control

How to Build a Fault-Tolerant Data Platform: Lessons from Netflix

Products

Lestar Finance

Lestar ESG

Services

Artificial Intelligent Solution

Business Intelligent Solution

ESG Reporting Solution

The BMAD Method: Ship Faster With AI Without Losing Control

How to Build a Fault-Tolerant Data Platform: Lessons from Netflix

The BMAD Method: Ship Faster With AI Without Losing Control

How to Build a Fault-Tolerant Data Platform: Lessons from Netflix

We Got Data Quality Wrong Until We Built This Framework

The three things we got wrong

What a data quality framework actually is

The key values we enforce

Standards, definitions, and shared meaning

The quality dimensions we track

Data contracts, expectations that stop breakages early

The operating model, who does what

Data owner

Data steward

Platform custodian

Data consumers

Where checks live, how we implement it

1) At entry, prevent bad data from entering the system

2) In pipelines, keep transformations from quietly degrading data

3) At metrics, protect the business truth

Monitoring and incident response, how we keep it from drifting

The dashboards we actually use

Severity levels that reduce panic

The workflow

How we roll it out without boiling the ocean

Start with what decisions rely on

Implement the minimum checks first

Assign ownership and SLAs

Expand by domain

The checklist we use for any new dataset or KPI

What changed after we adopted the framework

Where Lestar fits

Related posts

Why Your Best Financial Analysts Are Quitting (Hint: It’s the Copy-Pasting)

Why We Built an AI That Does Variance Analysis (So Your Team Doesn’t Have To)

Democratizing Data: How to Give Your Non-Technical Teams Access to SQL-Level Insights