Data observability is the practice of continuously monitoring the health of your data as it moves through production pipelines. You catch problems in minutes rather than days. It rests on five pillars: freshness, volume, distribution, schema, and lineage. Each pillar catches a distinct failure mode. Without all five, some class of problem stays invisible until a stakeholder notices something is wrong.
The 5 pillars at a glance
| Pillar | What it catches | Silent failure without it |
|---|---|---|
| Freshness | Table not updated within expected window | Dashboard shows yesterday's revenue as today's |
| Volume | Row count drop or spike vs. baseline | Duplicate load inflates every downstream metric |
| Distribution | Null rate spike, value range shift, cardinality change | DAU query returns near-zero because user_id went null |
| Schema | Column added, removed, renamed, or retyped | Downstream join silently breaks on type mismatch |
| Lineage | Which tables and dashboards depend on the affected data | You fix the table but miss the 3 reports reading it |
Why dashboards don't protect you
Dashboards display data. They don't audit it.
The typical silent failure: a pipeline stalls at 2 a.m., a table fills with nulls, a revenue metric stops updating. Your dashboard still loads. The numbers still appear. They're just wrong. The interface gives no signal that anything broke.
Traditional monitoring answers infrastructure questions: Is the server up? Did the job complete? It can't tell you whether the 847 rows that loaded this morning are the right 847 rows, or whether the null rate in your order_value column just climbed from 0.2% to 34%.
That gap is what data observability closes: it watches the data itself, not just the pipes it flows through. See also: what is data quality for the full checklist of signals that indicate data you can trust.
The 5 pillars of data observability
1. Freshness
Freshness is how recently a table was updated relative to its expected cadence. A sales table that updates every 15 minutes and hasn't moved in two hours is stale. Any dashboard reading it is showing yesterday's truth.
Freshness monitoring sets a per-table SLA and alerts when the gap between now and the last update crosses it. The most effective systems learn the natural update rhythm by time of day and day of week, so they don't fire false positives at 3 a.m. on Sunday when your batch jobs genuinely don't run. See also: data freshness.
2. Volume
Volume anomaly detection tracks whether the number of rows in a table is within the expected range for that point in time. A sudden 40% drop in rows loaded signals a broken upstream extract. A 300% spike signals a duplicate load.
The hard part is defining "expected." Monday morning row counts differ from Friday afternoon. A useful baseline learns historical volume patterns by hour and day of week, then flags deviations relative to that learned distribution. No static threshold to set and forget. See also: row-count anomaly.
3. Distribution
Distribution monitoring watches the statistical shape of column values (null rate, unique count, mean, percentiles) and alerts when that shape shifts unexpectedly.
A payment status column that normally has four distinct values suddenly showing 30 is a sign that something upstream changed. A revenue column whose mean drops from $180 to $12 is a data problem, not a business problem. Without distribution monitoring, the analyst dashboard shows the drop as real.
Null rate is the most actionable distribution signal for most teams. If a column that was 0.5% null last week is now 22% null, a join broke or a field stopped populating.
4. Schema drift
Schema drift is any unexpected structural change to a table: a column added, removed, renamed, or retyped. Schema changes are among the most common causes of downstream breakages. A transformation that expects user_id as integer silently fails when the source starts sending strings.
Schema monitoring compares the current table structure against the last known-good state and alerts on any difference. It doesn't prevent schema changes. It ensures you know about them before the analyst does. See also: schema change.
5. Lineage
Data lineage maps the path data travels from source to dashboard, showing which tables feed which downstream tables and reports. When a problem is detected in one of the other four pillars, lineage answers the impact question: which dashboards and models are affected by this broken table?
Without lineage, you fix the broken table and then spend two hours manually determining what broke downstream. With lineage, you get the blast radius immediately.
Data observability vs. monitoring vs. testing
The terms get confused. Here's the actual distinction:
| Approach | What it checks | When it runs | Who writes the rules |
|---|---|---|---|
| Data testing | Hard constraints (not null, unique, in range) | At pipeline run time | Engineers write explicit assertions |
| Data monitoring | Infrastructure health (job ran, volume in threshold) | Scheduled checks | Humans set static thresholds |
| Data observability | Statistical health of data itself, across all 5 pillars | Continuous, automated | Baselines learned automatically from history |
The three are complementary, not competing. Testing catches known failure modes. Monitoring catches infrastructure issues. Observability catches the unknown unknowns: the patterns nobody thought to write a test for.
How to start data observability without a data team
Most content on this topic is written for data engineering teams at mid-sized companies. If you're a founder, a full-stack engineer, or an analytics engineer working solo, the practical advice is different.
The difference between a table alert and a metric alert: a table alert says "the orders table hasn't updated in 4 hours." A metric alert says "daily revenue dropped 31%. Here's the SQL showing it concentrated in mobile checkouts after 6 p.m." The first requires investigation. The second hands you the investigation already done.
Tabkeel's Free plan monitors 10 tables and 2 business metrics, no credit card required. Run the free data quality check to see where your data stands before connecting anything.
What data observability doesn't replace
Before investing, be clear about what observability doesn't do:
- It doesn't replace deterministic data tests. If you need a hard guarantee that
user_idis never null, write a test. Observability will catch when null rate spikes, but it won't prevent the first null from landing. - It doesn't give column-level dbt lineage out of the box. Table-level lineage is available in most tools; column-level lineage that tracks through dbt transformations requires deeper integration.
- It doesn't fix bad data. Observability surfaces problems. The fix still lives in your pipeline, source system, or transformation logic. Think of it as the smoke detector, not the fire suppression system.
See the full comparison of data observability tools, including where each falls on the deterministic-vs-statistical spectrum, to match the right approach to your stack. For teams thinking about monitoring business metrics like DAU, revenue, and churn, the diagnostic query feature is where observability goes from infrastructure concern to business concern.
The cost of skipping it
Gartner estimates poor data quality costs organizations an average of $12.9 million per year. Most of that isn't cleanup time. It's the downstream effect of decisions made on wrong numbers.
For small teams, the math is simpler: one wrong metric presented to a board, one product decision based on a broken cohort, one pricing change triggered by a pipeline duplicate. Observability pays back the first time you catch a problem before a stakeholder does.
See where your data stands before setting up any monitoring. The free 2-minute data quality check grades your setup A–F, no signup required.
Frequently asked questions
What is data observability in simple terms?
Data observability is the ability to know when your data is wrong before someone else tells you. It monitors five dimensions (freshness, volume, distribution, schema, and lineage) continuously and automatically, so problems surface in minutes rather than days.
What are the five pillars of data observability?
Freshness (is the data recent?), volume (is the right amount of data present?), distribution (do the values look statistically normal?), schema (has the table structure changed unexpectedly?), and lineage (which tables and dashboards depend on this data?). Each pillar catches a different failure mode that the others miss.
What is the difference between data observability and data monitoring?
Data monitoring checks infrastructure: did the job run, did volume stay within a manually set threshold? Data observability watches the data itself, learns what normal looks like, and detects anomalies without requiring humans to write explicit rules for every failure scenario.
Can you do data observability without a data team?
Yes. Modern tools connect via a read-only credential, learn baselines automatically, and alert without requiring you to write SQL or own a pipeline. The main requirement is knowing which tables and metrics matter most. That knowledge lives with founders and engineers, not just data teams.
How long does it take to set up data observability?
A read-only connection to Postgres, Supabase, or BigQuery takes under two minutes. The baseline learning period is 7–14 days. First meaningful alerts typically fire within the first week, as the system identifies tables or metrics that deviate from the pattern it starts learning on connection.