
The BMAD Method: Ship Faster With AI Without Losing Control
December 9, 2025
How to Build a Fault-Tolerant Data Platform: Lessons from Netflix
December 10, 2025We Got Data Quality Wrong Until We Built This Framework
We used to think data quality was something we handle when something breaks.
A dashboard looks off, a stakeholder asks why, we trace the pipeline, patch the issue, and move on. That approach can keep the lights on, but it does not build trust. What changed for us was treating data quality as a repeatable system we run every day, not an occasional cleanup task.
Here is what we got wrong, and the framework that fixed it.
The three things we got wrong
First, we treated quality as cleanup instead of prevention. If the only time we look at quality is after a report looks strange, the best-case outcome is we find it late.
Second, we relied on tools without clear ownership. Monitoring and tests help, but when no one owns a dataset or a KPI definition, issues turn into slow coordination problems.
Third, we tested tables, but not business metrics. Schema checks can pass while the KPI quietly drifts due to joins, mapping changes, or upstream meaning shifts.
Once we accepted these three mistakes, the work stopped being mysterious. We needed a framework, not more ad hoc fixes.
What a data quality framework actually is
A data quality framework is the day-to-day operating system for keeping data trustworthy.
It is not a single tool. It is a structured way to define what “good” means, assign ownership, enforce checks, monitor drift, respond to incidents, and continuously improve.
If data quality management is the umbrella, the framework is how the umbrella works in practice. It turns intent into routines.
In our case, the framework came down to six building blocks:
- Standards and definitions
- Ownership and decision rights
- Automated quality checks
- Monitoring and alerts
- Incident response
- Continuous improvement cadence
The key values we enforce
A framework only works if “quality” is measurable. We made it measurable in three ways.
Standards, definitions, and shared meaning
Before we wrote tests, we wrote definitions.
For critical datasets and KPIs, we document:
- What it represents in business terms
- The source of truth, including authoritative systems
- The transformation logic at a high level
- Refresh expectations, including freshness targets
- Known exclusions and edge cases
This is not documentation for its own sake. It prevents “same word, different meaning” problems that no amount of testing can fix.
The quality dimensions we track
We kept the dimensions simple and practical. For most teams, these cover the majority of real failures:
- Accuracy, does it match reality
- Completeness, are required fields present
- Consistency, does it agree across systems and time
- Timeliness, does it arrive when needed
- Validity, does it follow rules and formats
- Uniqueness, are duplicates intentional and controlled
- Relevance, is the dataset tied to an actual business use
Not every dataset needs an intensive treatment. Relevance helps us decide where to be strict and where to be lightweight.
Data contracts, expectations that stop breakages early
A big step forward was treating important datasets like products with contracts.
A contract can include:
- Schema expectations and allowed types
- Accepted null behavior
- Allowed value ranges or categories
- Freshness targets
- Acceptable change, for example volume changes within a threshold
- Deprecation rules for fields, so changes do not surprise downstream teams
Contracts create stability. They reduce the number of “surprise changes” that show up as broken dashboards later.
The operating model, who does what
One reason data quality stays vague is that responsibility stays vague. We fixed that by making roles explicit, without adding heavy bureaucracy.
Data owner
Accountable for a dataset or KPI. The owner decides what “correct” means, approves definition changes, and prioritizes quality fixes when tradeoffs exist.
Data steward
Keeps the day-to-day quality process running. The steward watches quality signals, triages issues, manages thresholds, and helps maintain documentation.
Platform custodian
Runs the underlying infrastructure. This role ensures pipelines, storage, access controls, and runtime reliability.
Data consumers
Use the data, and report issues. Consumers are part of the feedback loop, not passive recipients.
Most importantly, we defined decision rights. Who can change KPI definitions, who can approve exceptions, and who signs off on a breaking change. That single clarity removes weeks of back-and-forth later.
Where checks live, how we implement it
A framework fails when checks are only applied in one place. We apply checks in three layers.
1) At entry, prevent bad data from entering the system
This is the cheapest place to catch errors.
Examples:
- Required fields and format validation
- Controlled vocabularies instead of free-text where possible
- Basic business rules, for example start date cannot be after end date
- Deduplication logic at ingestion when sources are noisy
The goal is not perfection, it is reducing avoidable junk early.
2) In pipelines, keep transformations from quietly degrading data
These checks protect the shape and health of datasets as they move.
Common checks:
- Schema tests, type checks, and contract enforcement
- Null thresholds for critical fields
- Row count and volume anomaly checks
- Referential integrity checks for joins
- Distribution checks for categories and numeric ranges
- Freshness checks for key tables
This layer gives fast signals when upstream or transformations change.
3) At metrics, protect the business truth
This is where we learned the biggest lesson. A pipeline can look healthy while the KPI becomes untrustworthy.
Metric-level checks include:
- Reconciliation against source totals for critical KPIs
- Trend break detection, sudden step changes or drops
- Sanity ranges, values that should never be negative or exceed expected bounds
- Cross-system parity checks when KPIs exist in multiple tools
- Slice-level health checks, not just overall totals
Table checks keep data stable. Metric checks keep decisions stable.
Monitoring and incident response, how we keep it from drifting
Even great checks fail if no one sees the signals or knows what to do next. Monitoring is part of the framework, not an optional add-on.
The dashboards we actually use
We keep monitoring focused on a few signals that drive action:
- Freshness and pipeline health
- Test failures by criticality
- KPI drift and anomalies
- Incidents over time, patterns and recurring sources
If a dashboard is not used, it gets deleted. Noise kills quality programs.
Severity levels that reduce panic
Not every issue deserves the same response. We classify incidents into a small set of severity tiers based on impact.
For example:
- High severity, affects customer billing, compliance, or board-level KPIs, often triggers blocking or rollback
- Medium severity, affects internal reporting but has workarounds, usually warn plus ticket
- Low severity, minor dataset issues, tracked and fixed in cadence
Clear severity rules prevent chaos. Teams know when to stop the line and when to proceed safely.
The workflow
Alert, triage, identify impacted KPIs, fix, verify, then add a guardrail so the same issue is less likely to return.
That last step is the framework doing its job.
How we roll it out without boiling the ocean
The fastest way to fail at data quality is trying to “fix all data”. We roll out the framework in a way that creates wins early.
Start with what decisions rely on
Pick 3 to 5 critical KPIs. Then map the upstream datasets and transformations that power them. That defines your first quality perimeter.
Implement the minimum checks first
For those KPIs and upstream datasets, implement a small set of high-signal checks:
- Freshness
- Volume anomalies
- Null thresholds for critical fields
- Referential integrity checks for important joins
- One or two metric reconciliations
Assign ownership and SLAs
A check without an owner is just a notification. For the first perimeter, define owners and response expectations.
Expand by domain
After the first perimeter, expand coverage domain by domain, not table by table. Scale what works, remove what creates noise.
The checklist we use for any new dataset or KPI
When a new dataset shows up, we try to keep the process consistent.
- Clear definition and business purpose
- Named owner and steward
- Known lineage and dependencies
- Contract, schema expectations and acceptable change
- Tests, thresholds, and monitoring signals
- Incident severity mapping, what happens if it breaks
- Review cadence, how we tune over time
This checklist turns “quality work” into a repeatable launch process.
What changed after we adopted the framework
The results were not magical, they were operational.
We spent less time debating numbers and more time using them.
Debugging became faster because ownership and checks narrowed the search space.
Repeated incidents dropped because learning turned into guardrails.
KPIs became more stable, which made analytics and AI use cases less fragile.
The biggest shift was trust. Once teams trust the data, everything moves faster.
Where Lestar fits
A framework is hard to run when transformations, definitions, and monitoring are scattered across too many systems. It becomes difficult to enforce consistency, trace lineage end to end, and connect quality failures to downstream impact.
Lestar helps by consolidating data work into a single environment, where transformations, governance, and quality checks can be applied consistently, monitored continuously, and traced back to the metrics and teams they affect.




