Exit Strategies

The Data Report: Weekly State of the Market in Data Product Building | Week ending January 25, 2026

Jan 26, 2026

The modern data stack sold us on flexibility. Pick the best tool for each layer. Swap components when something better comes along. Loosely coupled, easily replaced.

That was the pitch. This week’s stories reveal what that flexibility actually costs.

Fivetran’s new pricing model is pushing teams to model their exit. Practitioners are sharing techniques for validating 30-billion-row migrations. The OLAP landscape beyond Snowflake and BigQuery has quietly expanded into a constellation of specialized engines. And in the AI agent world, the debate between comprehensive frameworks and code-only simplicity is partly about avoiding dependencies you can’t shed.

The original MDS promise (interoperability, best-of-breed) turns out to require active maintenance. Every tool choice should include an exit strategy.

This week: vendor volatility, migration readiness, the new OLAP options, and the agent architecture debate.

Vendor Volatility

Exit strategies start with knowing what you’re locked into. For many teams, the first test case just arrived.

Fivetran’s March 2025 pricing shift changed how Monthly Active Rows (MAR) are calculated: from account-level to per-connector. The result? Teams with many low-volume connectors (the long tail of SaaS integrations most companies accumulate) saw bills jump 40-70%, with some reporting increases over 200%.

This week, a practitioner’s detailed breakdown of the impact sparked one of the more active discussions in r/dataengineering. The math is straightforward: if you have 20 connectors pulling under 1M rows each, you no longer benefit from bulk discounts. Each connector now stands alone.

The alternatives are getting attention: Airbyte (open source, self-hosted), dlt (Python-native, lightweight), Weld (fixed monthly pricing), and Portable (focused on long-tail connectors Fivetran doesn’t prioritize). The pattern isn’t unique to Fivetran. Managed services across the stack face pressure to expose their true cost structures, and teams are learning that “easy setup” has a variable price tag.

Watch: If you’re a Fivetran customer, model your per-connector MAR before renewal. If you’re evaluating EL tools, factor pricing model stability into your decision. The managed convenience premium is real, but so is the migration cost when that premium changes.

Migration Readiness

Knowing you might need to leave is one thing. Actually being able to leave is another.

Two stories this week touched the same nerve: the technical capabilities that make exits possible. The first was a practitioner asking how to validate a 30-billion-row table migration in Databricks. Row-by-row comparison is infeasible at that scale. The community’s answer: bucket-hash checksums (xxhash64 of a canonicalized row, grouped by hash bucket), per-column statistics (null ratios, min/max, approx_count_distinct), and selective anti-joins only where buckets differ.

The second was the perennial question of escaping Jupyter notebooks for production pipelines. The answers have evolved: marimo for reactive notebooks that feel like production code, nbdev for literate programming that syncs notebooks with packages, Dagster and Prefect for orchestration that doesn’t require rewriting everything.

The thread connecting these: migration readiness is becoming a core skill. With tool fragmentation comes the need for portability. Teams that can validate large moves and transition workflows without burning everything down have optionality. Teams that can’t are stuck.

Adopt: For migrations over 1B rows, statistical validation is mandatory. For notebook-heavy workflows, evaluate marimo or nbdev before the next replatforming project forces your hand.

The New OLAP Landscape

If you’ve been building on Snowflake, BigQuery, or Redshift, the OLAP market has quietly expanded around you. Time to catch up.

A discussion this week about building a blockchain data provider API compared ClickHouse, DuckDB, and Apache Doris. The requirements: ~15TB per chain, sub-500ms query latency, event searches over block ranges. The interesting part wasn’t the specific choice (ClickHouse for range scans won out) but that practitioners now routinely evaluate multiple OLAP engines for fit.

Here’s the landscape:

ClickHouse is the columnar analytics engine that processes logs and events at scale. Open source, vectorized execution, 10-100x I/O reduction for selective queries. The trade-off: complex JOINs are slower, ops burden is higher. Best for append-only data and simple aggregations.

DuckDB is the “SQLite of analytics.” In-process, zero dependencies, queries Parquet and CSV directly. Performance matches ClickHouse for single-node workloads. The limit: no distributed queries, so it caps out at single-machine scale.

Apache Doris (and its fork, StarRocks) fills the gap: real-time OLAP with strong JOIN performance and high concurrency. MySQL-compatible. Best for teams needing updates, materialized views, and mixed workloads.

The Big Three cloud warehouses aren’t going anywhere. But for specific access patterns (API-served analytics, embedded analytics, real-time dashboards), specialized engines often fit better and cost less.

Try: If you’re building an analytics API or embedded product, benchmark ClickHouse and DuckDB against your actual queries. Start local, measure, then scale.

Agent Patterns vs Agent Complexity

The final exit strategy isn’t about vendors. It’s about dependencies you’re building into your own systems.

The AI agent world is split. On one side: teams codifying production patterns into handbooks and frameworks. On the other: practitioners arguing that the complexity itself is the problem.

This week, The Agentic AI Handbook cataloged 113 patterns for reliable agent deployment. A key problem it addresses: context drift, nicknamed the “Ralph Wiggum loop” after the pattern of reinjecting prompts until the model decides it’s done. The solution? Human-in-the-loop checkpoints, observability, and control transfer protocols. The handbook is comprehensive. It’s also a sign of how much machinery production agents apparently require.

The counterargument came from two other stories. The Code-Only Agent proposes stripping agents to a single tool: execute_code. Every task becomes a “code witness,” a runnable artifact that’s auditable and reproducible. No tool orchestration, no framework dependencies. Similarly, Composing APIs and CLIs in the LLM era argues for letting agents use shell commands instead of bespoke integrations.

The tension is real. Frameworks solve problems (context drift, reliability, observability) that simpler architectures might avoid entirely. And simpler architectures are easier to exit.

Understand: Before adopting a heavy agent framework, test whether a code-only approach meets your needs. The 113 patterns are valuable reference, but many exist to solve problems that minimal architectures sidestep.

The Thread

The modern data stack started as a promise: best-of-breed tools, loosely coupled, easy to swap. That promise assumed the coupling would stay loose and the swaps would stay easy.

This week’s stories suggest both assumptions need active maintenance. Fivetran’s pricing change is a reminder that vendor terms can shift mid-contract. The OLAP landscape’s expansion means more options but also more evaluation work. Migration validation at scale requires statistical techniques that most teams haven’t practiced. And even in the agent space, the debate about frameworks versus simplicity is partly about avoiding dependencies that become liabilities.

The MDS isn’t dead. But its original principle (interoperability, flexibility) now demands explicit investment. Exit strategies aren’t pessimism. They’re the cost of optionality in a market that keeps fragmenting.

Build accordingly.

Discussion about this post

Ready for more?