The knowledge gap underneath the tooling
The Data Product Report: Weekly State of the Market in Data Product Building | Week ending May 19, 2026
This Week
A semantic layer is supposed to make your agent smarter. This week, a benchmark from semantic-layer vendor Cube put a number on how much smarter: 68%. That is the accuracy ceiling when an LLM has metric definitions but not the business reasoning underneath them. Good enough to demo, not good enough to trust. A survey of 334 data practitioners published the same week revealed why the reasoning stays undocumented: at 42% of organizations, data models belong to whoever built the pipeline last.
Your Agent Keeps Getting the Answer Wrong
Cube’s semantic layer has spent years convincing analytics teams that a centralized metrics store would save them from dashboard chaos. It worked, mostly. This week, Cube published a strategic pivot that reveals the next customer for all that centralized logic: not the analyst refreshing Looker at 9am, but the AI agent trying to answer “what was Q2 revenue in EMEA” without hallucinating the number. Cube Core, the open-source modeling layer, stays as it is. The commercial product is repositioning around agent consumption. Metrics, dimensions, access controls, business definitions: all served via APIs to guide natural-language-to-SQL and enforce governance.
The bet makes intuitive sense. If a semantic layer already translates business concepts into SQL for humans, an AI agent calling that same layer should get the same translations. The problem is the accuracy number. A benchmark by Cube (examined in a Republic of Data analysis) found that adding a semantic layer improves LLM question-answering accuracy, but it plateaus at roughly 68%. The number is directional, not definitive. The architectural implication is sharp. That ceiling appears when the agent has access to metric definitions (what “revenue” measures, which table it queries) but not the reasoning chain underneath: why EMEA excludes trial accounts, which quarter-end adjustments are baked into the calculation, what assumptions were contested during the last planning cycle. Schema metadata gets you to 68%. The missing layer is what the researchers call “semantic lineage”: a graph capturing not just what a metric measures, but why it is defined that way.
The dbt Roundup’s AI Council report published this week mapped the emerging infrastructure into three lanes: context providers (semantic layers, metadata catalogs), agent orchestrators (the frameworks that plan and execute queries), and compute/inference (the models and databases underneath). Agent benchmarks show measurable accuracy gains when the dbt Semantic Layer is available, but the report is careful to note that benchmark results routinely overstate real-world performance. The framework matters because it clarifies where the semantic layer sits (context provider) and what it cannot do alone: reason about business logic it was never given.
Every analytics tool is adding an “AI” tier this year. That is not news. What is notable is the honesty of the accuracy data. The vendors building agentic analytics platforms are publishing numbers that show those platforms are not ready for production trust. The 68% ceiling is not a model problem or a prompt-engineering problem. It is a knowledge-capture problem. The metric definitions are there. The reasoning chains are not. And capturing those chains (why this metric, why this scope, why this exception) is a human documentation task that no model capability can shortcut.
The bottom line: The analytics engineering teams that started documenting why their metrics are defined the way they are, not just what they measure, are the ones whose agents will break past the accuracy ceiling. Everyone else is shipping a 68%-right answer and calling it automation.
The Pipeline Builder’s Accidental Second Job
The documentation that would break the accuracy ceiling does not exist because nobody is responsible for writing it. This week, Joe Reis published survey results from 334 data professionals that put numbers to what most data teams already feel: 90% of data modeling failures are organizational, not technical. Only 4.8% of respondents cited tooling as the main fix. The rest pointed to training, requirements, time, and, above all, ownership.
The ownership breakdown is the kind of number that makes you close your laptop for a minute. Only 19.2% of organizations have a dedicated data modeler. At 42.5%, data models belong to “whoever builds the pipeline.” Another 7.8% of respondents reported that nobody owns data modeling at all. The organizations that enforce modeling standards, the ones with review processes and naming conventions and documented requirements, report models that hold up roughly five times longer. Not a marginal improvement. A structural one.
The same week delivered the counter-example. A practitioner published a cross-warehouse SQL cookbook for transaction fraud detection covering velocity checks, impossible-travel detection using LAG and haversine calculations, and amount-anomaly scoring, with working syntax across Snowflake, BigQuery, Databricks, Teradata, and Postgres. The cookbook handles the kind of cross-dialect friction that eats hours in practice: QUALIFY where it is available, CTE workarounds where it is not. This is what encoded institutional knowledge looks like when someone actually owns it. Portable, reusable, specific enough to drop into a pipeline by Friday.
The survey and the cookbook are two sides of the same coin. When nobody owns the models, institutional knowledge stays in someone’s head and evaporates when they change teams. When someone does own it, the knowledge becomes a durable artifact that outlasts the person who wrote it. The 5x durability finding is not about better tooling. It is about the decision to treat data models as something worth maintaining, not just something that gets built on the way to the next dashboard.
The bottom line: The teams with enforced modeling standards and dedicated ownership saw their models last five times longer. The tools were the same across the board. The difference was organizational: someone decided the models were worth owning.
The Radar
If you’re evaluating model architectures:
Interfaze shipped a hybrid CNN/transformer purpose-built for deterministic tasks: OCR, vision, structured extraction. It is topping benchmarks against general-purpose LLMs on those workloads at roughly $1.50/$3.50 per million tokens. If your pipeline runs extraction or entity recognition and you have been defaulting to a frontier model, this is the kind of purpose-built alternative worth benchmarking.
If you’re building infrastructure:
Ardent (YC P26) offers copy-on-write Postgres branching from existing RDS and Supabase instances with built-in data obfuscation, targeting CI pipelines and AI agent testing. The community reception was skeptical about the moat versus Neon, Supabase branching, and DBLab. If you are managing snapshot-based testing environments, it is worth a look. Mind the data-residency question: production data leaves your boundary.
If you care about governance:
Claude Platform on AWS shipped as a fully managed, Anthropic-operated AI service alongside Bedrock’s AWS-operated option. Same model family, two compliance postures: one keeps data in the AWS boundary, one does not. If your team is evaluating managed AI agents, the compliance split is the decision that matters, not the feature list.
If you manage a data team:
A CMU/MIT/Oxford study found that 10 minutes of AI coding assistance measurably increases quitting behavior and error rates once the AI is removed. The community debate was heated on whether the finding generalizes beyond the study’s controlled setting. If your team uses AI coding tools daily, the dependency question is worth one retro conversation.
Does your team document why metrics are defined the way they are, or just what they measure? We are curious what the semantic lineage gap looks like in practice. Reply and tell us.
The Data Product Report is published every Tuesday by RepublicOfData.io.


