Ask the Table Before You Optimize It

The Data Report: Weekly market signals on modern data platform shifts | Week ending June 29, 2026

Jun 30, 2026

The 30-second version
Fabric made “has this table actually drifted?” a single T-SQL call. Gate your compaction job on it and you stop paying to reorganize tables that never changed.
The credible way to let an agent build pipelines isn’t a sharper prompt. It’s writing your house standards down as skills the agent has to follow. dltHub’s Context Layer is the open version of that idea.
In the Radar: data-quality checks moving into the commit loop, the where-do-your-metrics-live question, and two more vendors putting agent workloads on plain Postgres.

This Week

Somewhere in your platform a job runs OPTIMIZE on every table every night, and most of those tables did not change since the last run. You are paying to reorganize data that was already fine, and the bill has no line item that admits it. This week Microsoft Fabric turned the thing you were missing into a single T-SQL call: ask a table whether it has actually drifted before you compact it. It’s a small release, but it converts a standing, invisible cost into a measured, conditional step, and it drops into the orchestration you already run.

A second signal, from a different corner of the stack, is worth the same kind of attention. Out in the ingestion layer, the question about agents and pipelines quietly stopped being whether an agent can build one. dltHub shipped a version where the entire selling point is the guardrails.

You’re Compacting Tables That Didn’t Move

Up front: blind nightly OPTIMIZE bills you for tables that never moved. Fabric just made “is this one actually drifting?” a single call you can gate the job on.

You almost certainly compact your lakehouse tables on a schedule. Nightly, weekly, some cron you set up once and stopped thinking about. The schedule is honest about one thing and silent about another: it runs OPTIMIZE whether or not the table needed it, and the compute you spend rewriting a table that hasn’t changed since yesterday looks identical on the bill to the compute that actually earned its keep. You can’t see the waste because nothing measures it.

This week Microsoft Fabric closed that gap. sp_get_table_health_metrics, a read-only T-SQL stored procedure in the SQL analytics endpoint, reached general availability. Point it at a table and it hands back anomaly flags and storage and layout metrics, the same drift signals you’d want before deciding whether maintenance is worth running. It’s plain T-SQL, which is the part that matters: it calls cleanly from Fabric Pipelines, Azure Data Factory, or dbt, so the check slots into the orchestration you already have rather than asking for new tooling.

The move it unlocks is the small, unglamorous one platform owners actually act on: check, then act. Instead of compacting every table on a blind schedule, you read the health metrics first and fire OPTIMIZE only on the tables flagged as drifted.1 The procedure is the news, but the pattern is the takeaway, and the pattern isn’t Fabric’s. Measure drift, then conditionally maintain, applies to any Delta or Spark lakehouse owner carrying a compaction bill. Fabric just made the measuring step a one-liner.

Bottom line: The teams that put a health check in front of their compaction job this week stopped paying to reorganize tables that hadn’t moved. The ones still on a blind nightly schedule are buying maintenance they have no way to see.

The New Pitch Is Guardrails, Not Autonomy

Up front: the credible way to let an agent build pipelines isn’t a better prompt. It’s encoding your standards as skills it has to follow.

If you lead a data team, the live question about agents and pipelines isn’t the one the demos answer. You’ve seen an agent write a pipeline. What you actually want to know is what stops it from writing one that ignores your medallion layers, picks the wrong grain, names everything its own way, and routes around your governance. The interesting design problem was never the agent’s autonomy. It’s the leash.

dltHub’s Context Layer is built around that premise, and it’s worth noticing because dlt is a widely-used open-source ingestion library, not a closed demo. Instead of free-form prompting, it compiles a high-level ask into a fixed, guardrailed skill chain: find the source, create the pipeline, debug it, validate it, view the result. Persistent context (your schemas, code, deployments, and logs) sits underneath so the agent works against what your pipelines actually look like, not a blank prompt. The agent is scoped to that chain. It’s a junior engineer following a runbook, not a free author.

The genuinely new part is not that an agent can build a pipeline. It’s where the design effort has moved. A year ago the argument was whether agents could author pipeline code at all; this week two independent teams shipped the same answer to the question that replaced it, which is how you fence the agent in. The transferable idea survives even if you never touch dlt: your repeatable standards (medallion patterns, grain, naming, governance) belong in runtime-loaded skills the agent must follow, not buried in a long prompt it can quietly drift away from.2

Bottom line: The teams shipping agent-built pipelines this week aren’t the ones with the cleverest prompt. They’re the ones who wrote their house rules down as skills the agent can’t skip.

The Radar

🤖 If you’re letting agents build pipelines. A second shop landed on the same shape as the dltHub story above: Daikin Applied built its pipelines with Databricks Genie Code by loading its house standards as runtime skills and treating the model as a scoped junior engineer. The productivity numbers are vendor-reported, so weigh them as such. The signal is that two independent teams reached for guardrails over autonomy in the same week.

🔍 If you care about observability. Monte Carlo moved data-quality checks into the editor and commit loop: downstream blast radius before an edit, monitors generated as code after, coverage gaps flagged before a merge. The question is worth sitting with before agents are writing your pipeline code at volume. Does reliability belong in the commit, not the post-mortem?

🧩 If you’re deciding where your metrics live. Two items circle the same call. A trade-press argument for building the AI semantic layer as a standalone metadata service rather than a feature inside your business-intelligence (BI) tool, and Salesforce exposing governed Tableau metrics to any agent through Headless Analytics over the Model Context Protocol (MCP), the emerging standard for wiring agents to data and tools. Keep your definitions portable before a BI vendor’s agent owns them.

🔒 If you’re about to switch on Copilots. Microsoft’s case for Fabric data protection is the unglamorous prerequisite nobody wants to do first: classify the data, set least-privilege access, and check what your catalog is oversharing before an agent can read across all of it at once. The advice is familiar. The reason to act now is that agents compound whatever oversharing you already have.

🦆 If you’re watching the substrate. Two more vendors put agent workloads on plain Postgres this week. cognee 1.0 collapses the separate vector and graph stores for agent memory onto a single Postgres, and Databricks made its case for serverless Postgres on the lakehouse. If you’re about to stand up a separate vector or graph store, the question worth asking first is whether it still earns its place.

How do you decide when to run OPTIMIZE today: a fixed nightly schedule, a file-count heuristic, or have you wired table-health metrics into the trigger? Reply and tell me what’s pulling the lever.

Published by RepublicOfData.io. Curated by Olivier Dupuis.

The SQL analytics endpoint is read-only, so the procedure diagnoses but does not act. When a table is flagged, you still fire OPTIMIZE from a notebook or job. It’s a small general-availability release with no community discussion behind it yet, so treat it as a sharp tool rather than a proven cost program. The value is the measured-then-conditional pattern, which generalizes past Fabric.

This is one vendor’s blog with no independent benchmark or community discussion behind it, so weigh the workflow as a direction, not a proof. The corroboration that the guardrail shape is hardening comes from a separate shop (Daikin, in the Radar) reporting the same pattern, with its own vendor-reported numbers. Two vendor-authored data points are a trend worth watching, not a settled result.

Discussion about this post

Ready for more?