The Operator’s Burden

The Data Report: Weekly State of the Market in Data Product Building | Week ending February 1, 2026

Feb 02, 2026

This week, the data community had a collective reckoning with what comes after the build. Vercel published benchmarks showing that coding agents need carefully compressed instruction manuals, not just access to tools. A legal analysis argued that “the AI hallucinated” is becoming an airtight defense because nobody can trace intent through multi-agent workflows. Reddit’s r/dataengineering lit up over Streamlit apps multiplying unchecked and the stubborn persistence of Airflow despite a decade of death notices.

The pattern across all of it: the industry is getting very good at making things. It’s not getting proportionally better at running them. Creation is fast, cheap, and accelerating. Operation is slow, expensive, and someone else’s problem, until it isn’t.

Four themes this week: how to configure AI tools for real work, why AI accountability is still a blank spot, what happens when self-serve mints too many builders, and why the boring tools keep winning.

Teaching Machines to Read the Manual

Before July 2025, every AI coding tool had its own instruction format. Cursor had .cursorrules. Windsurf had .windsurfrules. Claude had CLAUDE.md. If you wanted consistent behavior across tools, you maintained multiple files saying roughly the same thing. Then Google, OpenAI, Cursor, and Sourcegraph launched AGENTS.md as a unified standard under the Linux Foundation. One file to rule them all.

This week, Vercel published evaluation results that explain why the format works so well. They compared two approaches for teaching coding agents new Next.js 16 APIs: a tool-invoked skill (agent calls a docs tool when needed) and a compressed ~8KB index baked into AGENTS.md (always-on context). The compressed index hit a 100% pass rate. Skills managed 79%. The baseline without either: 53%.

The key finding is counterintuitive. You’d expect the sophisticated approach (tools that fetch docs on demand) to win. But every tool invocation is a decision point where the agent can fail to look things up, look up the wrong thing, or misinterpret what it finds. The compressed index removes all those decisions. It’s just there, in context, every time.

Meanwhile, OpenAI expanded ChatGPT’s containers to run Bash, install packages via pip and npm, and execute code in Ruby, Go, Java, and a dozen other languages. What started as Code Interpreter in 2023 is now a full development environment. The gap between “AI assistant” and “AI-powered IDE” keeps shrinking.

The operator’s burden here: these tools work in demos. Making them work reliably on your codebase requires explicit, carefully structured instruction files. Agent configuration is becoming its own discipline, closer to infrastructure-as-code than prompt engineering.

Try: If you’re using AI coding agents, experiment with a compressed AGENTS.md index for your project’s conventions. Test whether always-on context outperforms on-demand tool calls in your setup.

The Accountability Gap

In February 2024, a Canadian tribunal ruled Air Canada liable for its chatbot’s incorrect bereavement fare advice. The company argued the chatbot was a separate entity. The court disagreed. Damages: CAN$812. The precedent: companies own what their AI says.

But that was a single chatbot giving a single wrong answer. This week, a legal analysis argued that “the AI hallucinated” is becoming a much harder defense to challenge in agentic workflows. When an AI agent chains actions across multiple systems (read a database, call an API, write to a file, send a notification), logs show events but not authorization. Nobody signed off on the specific sequence. Scope and intent get diffused across hops. The post proposes “Tenuo Warrants,” cryptographic authorization objects that bind humans to specific agent actions with signed receipts.

The problem is real. In 2025, an AI agent at an unnamed company deleted a production database and then continued destroying multiple systems. Who authorized that? The person who started the agent? The person who configured it? The person who deployed it?

On the observability side, a new tool called Sherlock (since renamed Tokentap) offers a MitM proxy that intercepts HTTPS calls to LLM APIs and displays real-time token usage in a terminal dashboard. It exists because developers literally cannot see what their coding agents send to API endpoints. The 119-comment Hacker News discussion surfaced a sharp debate: is verbose agent behavior a model quirk, or is it intentional design to increase token spend?

LLM observability has grown into a real category since LangSmith launched in July 2023. Langfuse (19K+ GitHub stars, open source), Helicone, and Arize Phoenix all track traces, tokens, and costs. But none of them solve the authorization problem. They tell you what happened. They can’t tell you who decided it should happen.

The EU AI Act’s full compliance framework for high-risk AI takes effect in August 2026. Courts are increasingly holding vendors liable (the Workday discrimination case in 2024-2025 was the first time a vendor, not just a deployer, was held directly responsible). But enforcement still faces the same causation challenge: proving who authorized what in a multi-agent chain.

Watch: If you’re deploying AI agents in production, instrument your API calls now. Know what’s being sent and how much it costs. And start thinking about authorization trails, not just execution logs.

More Builders, More Problems

Streamlit launched in 2019 and hit 200,000 applications within eight months of open-sourcing. Snowflake acquired it in 2022, integrating it directly into the platform. The pitch: anyone with Python skills and Snowflake access can ship a data app.

This week, a practitioner on r/dataengineering raised the governance consequences. Each new Streamlit app can spawn its own Snowflake database and tables. Nobody tracks who built what. Access patterns multiply. Costs creep. The 24-comment discussion converged on a familiar tension: Streamlit is great for prototypes, but production deployment without guardrails creates sprawl that the platform team inherits.

Gartner projects that by 2027, 75% of employees will acquire or create technology outside IT’s visibility, up from 41% in 2022. This isn’t rebellion. It’s what happens when official platforms are slower than the workaround. Shadow analytics (the analyst’s spreadsheet that becomes the trusted source of truth) has always existed. AI tooling is just accelerating the pattern.

In the same week, a Reddit thread asked how data practitioners should adapt to the “full stack” push. Organizations want generalists who handle ingestion, modeling, and AI features end-to-end. The 99-comment discussion was less about whether this is happening (it is) and more about what to do about it. The consensus: add AI engineering and product skills, but push for platform investment that prevents every new builder from reinventing infrastructure.

OpenAI’s ChatGPT container expansion fits the same pattern. When a chatbot can run bash, install packages, and execute code in a dozen languages, the barrier to building drops further. That’s good for velocity. The operator’s burden is everything that comes after: maintaining, securing, and keeping coherent the artifacts that all these new builders produce.

Watch: If your organization is enabling self-serve builders (through Streamlit, AI coding tools, or low-code platforms), invest equally in the platform layer. Governance, resource management, and deployment standards aren’t optional. The bottleneck shifts from “not enough builders” to “not enough coherence.”

The Tools That Persist

Someone told a data engineer that nobody uses Airflow or Hadoop in 2026. The Reddit response was swift and decisive: Airflow is everywhere. Hadoop, less so, but that’s a different conversation.

The numbers back the community up. Airflow hit 320 million downloads in 2024, 10x more than Prefect (32M) and over 20x Dagster (15M). Over 80,000 organizations use it, up from 25,000 in 2020. 92% of users would recommend it. The “Airflow is dead” narrative has been running since roughly 2018, when real pain points (scheduler limitations, developer experience, batch-only design) drove teams to evaluate alternatives.

But Airflow adapted. Version 2.0 in December 2020 rewrote the scheduler, added the TaskFlow API, and improved the REST interface. Airflow 3.0 in April 2025 was the biggest release in the project’s history: DAG versioning, multi-language Task SDKs, and event-driven scheduling. It borrowed ideas from competitors (Dagster’s asset-centric approach, Prefect’s developer ergonomics) and shipped them into the tool that already had the community and ecosystem.

Dagster and Prefect found real niches. Dagster’s asset-centric model and Components framework (GA October 2025) serve teams that want data awareness baked into orchestration. But Prefect’s commit activity has been declining since mid-2021. The orchestrator wars didn’t produce an Airflow killer. They produced an Airflow that absorbed the best ideas from its challengers.

Separately, Henrik Warne’s post praising the --dry-run flag drew 88 comments about safe-by-default design. The pattern isn’t new (Terraform’s plan, Docker Compose’s config, AWS CLI’s --dry-run all predate this). Gary Bernhardt’s “functional core, imperative shell” screencast laid out the architecture in 2012. But the discussion showed that the community values these patterns more than ever. When you can spin up a pipeline in minutes with AI assistance, the ability to preview what it’ll do before it does it becomes critical safety infrastructure.

Both stories point to the same thing: the tools and patterns that persist are the ones built for operators. Airflow survives because it works at scale in production, not because it wins feature comparisons. --dry-run persists because it respects the operator’s need to verify before committing. In a week defined by the gap between creation and operation, these are the tools that close it.

Adopt: Add --dry-run or equivalent safe-by-default flags to your CLIs and pipeline tooling. Understand: Evaluate orchestrators on operational fit and ecosystem depth, not marketing narratives. Airflow 3.0 is worth a fresh look if you dismissed it based on 2018-era complaints.

The Thread

The data ecosystem keeps getting better at starting things. New agents, new dev environments, new self-serve tools, new builders entering the field every week. That’s not the hard part anymore.

The hard part is what comes next. Configuring agents so they don’t hallucinate your API conventions. Building authorization trails for actions no human explicitly approved. Governing the Streamlit apps and pipelines that multiply when everyone can ship. Keeping the orchestrators running that were declared dead years ago but still power the work.

Creation is cheap. Operation is where the debt accrues. The teams that invest in the operator’s burden (the instruction files, the observability, the governance, the --dry-run flags) are the ones whose systems will still be running next year.

Discussion about this post

Ready for more?