The Definitions Problem

The Data Report: Weekly State of the Market in Data Product Building | Week ending March 1, 2026

Mar 02, 2026

Joe Reis published a post this week titled “The Reckoning Is Already Here.” His claim: AI assistants now produce production-quality SQL, pipelines, and configs. The era of the data practitioner who doesn’t use AI tools is ending.

He’s probably right. But the week’s other stories suggest a different bottleneck.

A practitioner mapped 31 data quality tools. Most teams use none of them. A pipeline ran green and delivered zero rows. Three separate discussions arrived at the same conclusion: ontology (not AI) is the missing architectural layer. And a team with 40 Airflow DAGs asked where the self-healing pipeline is, because retries and backoff aren’t it.

AI can write the SQL. The question nobody’s answering: SQL against what definitions? What metric logic? What test criteria? What business ontology?

This week’s stories all point at the same gap. Not capability. Definitions.

The Reckoning

Joe Reis has been tracking this arc for two years. In early 2024, he called LLMs “not exactly useless, but not universally useful” and warned they “often create much more work than existing non-AI tools.” By mid-2025, he was writing that data is at a scale beyond human ability to manage. Last week he published “2028: THE GREAT DATA RECKONING,” a satirical memo from a future where those “over-indexed on tools and under-indexed on fundamentals” were the ones still employed.

This week’s follow-up, “The Reckoning Is Already Here,” pulls the timeline forward. His claim: something changed in the last month or two. A product manager can now describe what they want in plain English and receive a working DAG (tested, documented, deployed) in about 11 minutes. Data engineers whose value is “I know how to use dbt” are, in his framing, the railroad workers watching spike-driving machines arrive.

His own survey data backs part of this: 82% of 1,101 data engineers report daily AI usage. But 64% are still stuck in “experimenting” or “tactical tasks.” Only 10% have AI embedded in workflows. And a separate MIT/Snowflake survey found 77% of data engineers report heavier workloads despite AI tools. Astronomer’s State of Airflow report adds the punchline: over 80% use AI to write Airflow DAGs, but they “overwhelmingly report” hallucinations, missing context, and outdated syntax.

Reis isn’t wrong that the capability ceiling has risen. But his reckoning has a definitions problem. The 11-minute DAG works when someone has already defined the schema, the metric logic, and the acceptance criteria. The reckoning isn’t about whether AI can write the code. It’s about whether your organization has defined what “correct” means.

Understand: This framing will shape conference talks, hiring expectations, and vendor pitches for the rest of 2026. The practitioners who survive Reis’s reckoning aren’t the ones who adopt AI fastest. They’re the ones who can answer the question AI can’t: what should this pipeline actually produce?

The Promise vs. The Practice

Mendral published a case study this week that reads like a self-healing pipeline actually working. Their LLM agent queries ClickHouse over 1.5 billion CI log lines per week, writes its own SQL (no predefined queries), and closes 16,000 investigations per month. A single investigation involves 10 to 20 LLM calls and 30 to 50 tool executions. It can trace a flaky test to a dependency bump three weeks ago by correlating across hundreds of CI runs.

On the same Hacker News front page, practitioners debated whether this is the future or a well-funded outlier. Skeptics want concrete accuracy metrics. Proponents argue that orchestration and data modeling matter more than model choice. The 107-comment thread kept circling the same question: can you trust it?

ClickHouse published its own answer last year. In a study testing five leading models against real observability data, zero-shot accuracy for root cause analysis ranged from 44% to 58%. With prompt engineering, it climbed to 60-74%. Experienced humans with tools hit 80%+. Their conclusion: “Autonomous RCA is not there yet.”

Meanwhile, on Reddit, a practitioner with roughly 40 Airflow DAGs asked if anyone has found a self-healing pipeline tool that actually works. The 22-comment thread was unanimous: no. Most prefer fail-loud behavior with human review. Managed connectors (Fivetran, Airbyte) can absorb some schema drift, but that’s connector maintenance, not pipeline healing.

The gap is clear. AI excels at structured investigation: querying well-indexed data, correlating patterns, summarizing findings. It fails at the messy operational reality: the 3 AM DAG failure where an upstream schema changed, a credential expired, and the retry logic hit a race condition. Soda’s survey found 61% of data engineers spend half or more of their time handling data issues. AI isn’t reducing that number yet.

Try: LLM agents for structured debugging against well-modeled data (Mendral’s approach). Avoid: vendor claims about autonomous pipeline remediation. The gap between structured investigation and messy operations is where most teams actually live.

31 Tools and Nobody’s Testing

A Reddit thread this week mapped 31 data quality tools. The community’s verdict: most teams use dbt tests or nothing at all.

This shouldn’t be surprising. DataKitchen’s 2026 landscape catalogs over 50 commercial DQ vendors, plus a separate open-source ecosystem. The category exploded between 2017 and 2022: Great Expectations (2017), Soda (2018), Monte Carlo (2019), Datafold (2020), Elementary (2021). Monte Carlo hit unicorn status in 2022. Great Expectations raised $40M the same year.

Three years later, the market is consolidating. Datadog acquired Metaplane in April 2025. Snowflake acquired Select Star. The venture-funded wave is hitting a wall: most teams either can’t justify a separate vendor or won’t adopt one.

Why? Because dbt’s four generic tests (unique, not_null, relationships, accepted_values) ship free, run in the same repo, and require zero additional infrastructure. Add dbt-utils and dbt-expectations, and you’ve covered most failure modes without adding a vendor. dbt’s v1.8 unit testing framework made the case even harder for standalone tools.

And yet: dbt Labs’ own 2024 survey shows 57% of practitioners cite poor data quality as their chief obstacle, up from 41% in 2022. It’s getting worse, not better. The tools exist. The practice doesn’t.

A second thread this week illustrated why. A pipeline ran green and delivered zero rows. The discussion (48 comments) landed on familiar ground: limited time, unclear ownership, and no upfront value proposition for testing. Teams add tests reactively, after an incident. The debate wasn’t about which tool to use. It was about whether to test at all.

The cost of not testing is documented. Unity Technologies lost $110M in Q1 2022 when bad training data corrupted its ad targeting models (37% stock drop). Uber underpaid tens of thousands of drivers for years because nobody checked the commission calculation. These aren’t tool problems. They’re definition problems: nobody defined what “correct output” looked like, so the pipeline delivered whatever it produced.

Adopt: Start with dbt’s four generic tests on every primary key. Add row-count and freshness checks on critical tables. You don’t need tool number 32. You need the discipline to define what “correct” means for each pipeline, and the organizational will to enforce it.

The Ontology Moment

Three independent stories this week converge on the same idea: ontology is the missing architectural layer.

A Reddit post argued for ontology-driven data modeling: capture your business ontology first, then let LLMs generate the data model. The 31-comment discussion split predictably. Skeptics said ontology is already implicit in data modeling. Proponents reported success using ontology-first, question-driven approaches to bootstrap models for new clients.

On Hacker News, an open-source deep dive into Palantir’s architecture made the case that Palantir’s moat isn’t AI. It’s their Ontology: an executable digital twin that unifies objects, links, and actions into a queryable layer. The 59-comment thread was contentious. Some called it marketing gloss over standard SQL and graph concepts. Others credited Palantir for doing the unglamorous work of integrating messy enterprise data into a coherent model, something most organizations won’t invest in.

A third thread, on metric governance in a world of AI agents, asked the question that ties these together: how do you ensure AI agents use correct metrics when your semantic layer lags behind reality and not all metrics live in the warehouse?

The concept isn’t new. Business Objects built the first semantic layer in 1991. Tim Berners-Lee’s Semantic Web vision dates to 2001 (it mostly failed). Google’s Knowledge Graph (2012) proved ontology works at scale when you control the data. What’s changed is the pressure. AI agents need definitions to operate correctly. Without an explicit ontology, LLMs hallucinate entity relationships. Without metric definitions, agents generate plausible but wrong business logic. The Open Semantic Initiative (launched September 2025) and Microsoft’s Fabric IQ (November 2025) are early signals that the industry is starting to formalize this.

If your team uses a semantic layer, you’re partway there. A semantic layer defines metrics and dimensions. Ontology goes further: entity relationships, business rules, domain constraints, the full vocabulary your organization uses to describe what it does. It’s the difference between defining “revenue” and defining the business model that produces it.

Understand: Ontology is moving from academic concept to practical architecture concern. As AI agents proliferate, teams without explicit definitions face compounding governance gaps. The semantic layer was step one. Ontology is the step most teams haven’t taken.

The Thread

Joe Reis says the reckoning is here. The tools can write production SQL, generate DAGs, and query terabytes of logs autonomously. He’s right about the capability. But every other story this week points at the same gap.

A pipeline delivers zero rows and counts as success, because nobody defined what success looks like. 50+ data quality tools exist and most teams use none of them, because adopting a tool requires first defining what to test. Three conversations arrive independently at ontology as the missing layer, because AI agents need explicit definitions to operate correctly.

The reckoning isn’t about whether AI can write the code. It’s about whether you’ve defined what “correct” means: the metric logic, the test criteria, the business ontology. AI accelerates whatever you’ve built. If you’ve built on undefined foundations, it accelerates the chaos.

The practitioners who come out ahead aren’t the ones who adopt AI fastest. They’re the ones who invest in the definitions that make AI useful.

Discussion about this post

Ready for more?