The Modeling Reckoning

The Data Report: Weekly State of the Market in Data Product Building | Week ending February 15, 2026

Feb 16, 2026

The data engineering profession doesn’t often stop to measure itself. This week it did, from three directions at once.

Joe Reis surveyed 1,101 practitioners. A separate report gathered 1,000+ responses. And Reddit held a nine-year retrospective on Max Beauchemin’s “The Rise of the Data Engineer.” The findings line up: 82% use AI daily. Only 5% have semantic models. Infrastructure is a solved problem. Modeling isn’t.

That 5% number is the through-line for everything else this week. dbt Labs held an AMA where the loudest questions weren’t about AI features but about intermediate materializations, pricing, and whether the Fivetran merger changes what Core users can expect. A senior DE used Claude Code and a MotherDuck MCP server to build a dbt data mart from messy ERP data in hours. Research confirmed that the harness you wrap around a coding agent matters more than which model runs inside it.

The profession’s reckoning is clear: the pipes are strong, the semantics are weak, and AI just made the gap between the two impossible to ignore.

Two Surveys, One Diagnosis

The data engineering profession has been measuring itself for years, but rarely from this many angles at once.

Joe Reis’s 2026 survey of 1,101 practitioners landed alongside a separate 1,000+ respondent report, both asking the same question: where are we? The answers converge. AI is everywhere (82% daily use in the Reis survey) but unevenly effective. Only 5% of teams use semantic models. 59% cite “pressure to move fast” as the top modeling pain point. 51% say nobody owns data modeling at their org.

Meanwhile, Reddit’s r/dataengineering held an informal nine-year retrospective on Max Beauchemin’s foundational “The Rise of the Data Engineer.” The verdict there matches the surveys: infrastructure got dramatically easier. Managed cloud, ELT, dbt. All standardized. But governance, data quality, and ownership? Still hard. And the role itself remains loosely defined, spanning DevOps, analytics, domain translation, and sometimes frontend.

This isn’t a new diagnosis. Chad Sanderson wrote about “The Death of Data Modeling” in 2022. Tim Hiebenthal argued dbt made it so easy to write SQL that teams skipped the design step entirely. What’s different in 2026 is the scale of the evidence: two large-sample surveys, nine years of hindsight, and the same blind spot.

Understand: The profession solved the plumbing problem. The modeling problem is next. If your metrics aren’t defined, your models aren’t documented, and nobody owns data quality, the surveys say you’re in the majority. That’s both reassuring and concerning.

dbt’s Post-Merger Identity Crisis

Three weeks ago, the Fivetran pricing spike dominated this report’s conversation. This week, the other side of the merger had its turn.

dbt Labs held an AMA on Reddit to discuss Core 1.11, AI features (MCP server, ADE bench, agent skills), and Fusion GA timing. The 100 comments that followed read less like Q&A and more like couples therapy.

The context matters. The Fivetran-dbt merger closed in late 2025 as an all-stock deal approaching $600M combined ARR. A month earlier, Fivetran had acquired Tobiko Data (the makers of SQLMesh), which means the most visible dbt alternative is now owned by the same parent company. That complicates exit stories.

What the community actually wanted to talk about: intermediate materializations (a longstanding feature request), streaming workloads, and whether Cloud-first features will keep widening the gap with Core. Enterprise seat pricing came up repeatedly, with multiple practitioners reporting that trust has eroded. Only ~12% of dbt’s user base is on Cloud; the 88% on Core are watching closely.

The dbt pricing playbook isn’t new. 100-700% increases in late 2022, consumption-based pricing in 2023, and Fivetran’s own history of 4-8x jumps. The merger amplifies the concern: if one company now controls both ingestion and transformation, pricing leverage increases.

Watch: If you’re on dbt Cloud, Fusion GA timing and the next pricing cycle will define the value proposition. If you’re on Core, the community’s anxiety is a signal, not a reason to panic. But with SQLMesh now under the same corporate umbrella, the “alternative” landscape is thinner than it was six months ago.

The Agent That Modeled

A senior data engineer posted a detailed account of using Claude Code with a MotherDuck MCP server to build a complete dbt+DuckDB data mart from messy legacy ERP data in MSSQL. The agent explored the source data, generated staging/fact/aggregate models with tests, and iterated through QA. What would normally take weeks compressed into hours.

The key: the practitioner didn’t just point an agent at a database and hope. They gave it explicit conventions (raw > stg > fct > agg), domain context, and analytical use cases. The agent produced; the human verified. The community’s reaction split predictably between ERD purists and one-big-table advocates, but the real signal is that the workflow produced working, tested models.

Separately, a Hacker News post demonstrated that improving 15 LLMs’ coding performance came down to changing the harness, not the model. Replacing brittle edit methods (apply_patch, str_replace) with model-agnostic tools using stable line identifiers lifted reliability across every model tested.

The concept of harness engineering has solidified fast. Anthropic published guidance on long-running agent harnesses in November 2025. OpenAI described building a product with ~1M lines of code and zero manually-written lines, arguing the engineering team’s job shifted entirely to designing environments and feedback loops. The pattern: context and structure beat raw model power.

For data engineering specifically, MCP is the enabler. Launched by Anthropic in November 2024, adopted by OpenAI and Google in 2025, and donated to the Linux Foundation in December 2025, it connects agents to databases, Git repos, and tools without custom integration work. The MotherDuck MCP server in this week’s story gave Claude Code direct access to query and explore the data.

Try: The workflow is reproducible. Claude Code + an MCP server for your database + clear modeling conventions in a CLAUDE.md file. The investment is in the harness (your conventions, your domain context, your QA process), not in chasing the latest model release. AI doesn’t replace modeling skill. It amplifies it.

The Semantic Layer Gap

Here’s the number that ties everything together: 82% of practitioners use AI daily, but only 5% have semantic models.

Joe Reis’s survey surfaced this gap explicitly. It’s not that teams don’t know semantic layers exist. It’s that the organizational cost of defining metrics, getting cross-team agreement, and maintaining definitions is higher than most teams are willing to pay. The five classic traps haven’t changed: analysis paralysis over which metrics to define first, cross-team trust gaps, complexity overhead, user reversion, and the prerequisite of data consolidation.

The technology isn’t the blocker. The semantic layer market has matured considerably since Looker’s LookML first proved the concept in 2013. dbt acquired Transform in February 2023 and brought MetricFlow to GA by October 2024. Cube runs as open-source middleware between warehouses and BI tools. Snowflake and Databricks have been building native semantic layers. Drew Banin and Nick Handel debated the metrics layer’s future publicly in 2022; four years later, the architecture question is largely settled. Three patterns work: warehouse-native, transformation-layer (MetricFlow), and OLAP-acceleration (Cube).

What hasn’t been settled is organizational adoption. The surveys this week confirm it. And the AI story this week illustrates why it matters: the practitioner who built a data mart with Claude Code succeeded partly because they had conventions and business definitions to give the agent. Without that layer, the agent would produce models that technically work but semantically mean nothing.

AI makes this gap urgent. Every team deploying AI on top of their data is, whether they know it or not, building on whatever semantic foundation exists. For 95% of teams, that foundation is implicit, scattered across BI tool definitions, tribal knowledge, and undocumented SQL.

Adopt: If you’re investing in AI features, investing in semantic definitions first is not optional. The tooling exists: MetricFlow, Cube, or even a well-structured set of dbt metrics. The 5% who have semantic models aren’t just better organized. They’re the ones whose AI features will actually work.

The Thread

Nine years of progress, and the blind spot is the same one it was at the start.

The profession built the pipes. Managed cloud, ELT, orchestration, warehouses: all mature, all commoditized. AI arrived and made everything faster. But faster at what? For the 95% without semantic models, faster means more dashboards with inconsistent metrics, more pipelines without documented business logic, more AI features built on implicit definitions that nobody agreed on.

The dbt community’s anxiety isn’t really about pricing or merger politics. It’s about whether the tools that were supposed to solve the modeling problem will still prioritize it. The practitioner who modeled a data mart with Claude Code in hours succeeded because they had conventions to give the agent. Most teams don’t.

The modeling reckoning isn’t coming. The surveys say it’s here.

Discussion about this post

Ready for more?