The Protocol Wars Ended Before They Started

The Data Report - Week ending December 14, 2025

Dec 14, 2025

Anthropic, OpenAI, and Block agreed on a standard for AI agents this week. Meanwhile, a quieter pattern emerged across several stories: teams are opting for simpler architectures over distributed complexity, and databases are absorbing capabilities that previously required separate systems.

Here’s what matters for data product builders.

Agents Get a Common Language

The Model Context Protocol is now under neutral governance. Block, Anthropic, and OpenAI co-founded the Agentic AI Foundation under the Linux Foundation, with Google, Microsoft, AWS, and Cloudflare as supporters.

The adoption numbers are already significant: 97 million monthly SDK downloads, 10,000 active servers, and support across ChatGPT, Claude, Gemini, Copilot, and VS Code. The new spec adds Tool Search for managing thousands of tools and Programmatic Tool Calling for complex agent workflows.

Protocol wars usually occur first, followed by standardization later. This time, the major players agreed before fragmentation could set in. That rarely happens.

For data product builders, this matters because AI agents increasingly need to talk to your stack—querying warehouses, triggering pipelines, calling transformation logic. MCPShark already exists for debugging agent-to-tool traffic. tomcp.org turns any URL into an MCP server. OpenAI quietly added skills that mirror Anthropic’s spec, making automations portable across providers.

If you’re building integrations for AI agents, MCP is the interface to target. The bet looks increasingly safe.

Simplicity Keeps Winning

Several stories this week point to the same pattern: teams are moving away from distributed complexity when they don’t need the scale.

Twilio Segment moved from microservices back to a monolith. Their event-forwarding system used a shared queue mixing fresh traffic and retries for 100+ destinations. One destination’s outage flooded retries and caused head-of-line blocking across everything. A single service simplified testing, deployment, and scaling for a small team.

The SQLite ecosystem keeps expanding into territory that used to require heavier infrastructure. Litestream VFS lets you query SQLite directly from S3 without restoring the full database—instant point-in-time recovery via PRAGMA litestream_time. Generated columns with indexes give you B-tree performance on JSON fields without duplicating storage.

sql-flow runs DuckDB SQL over Kafka topics. Test your configs against fixture data, then deploy as a Dockerized daemon. It’s stream processing without Flink’s operational weight.

The common thread: simpler architectures with fewer moving parts. Microservices, distributed databases, and complex streaming frameworks have real costs. If your scale doesn’t demand them, you’re paying overhead for capabilities you’re not using.

Databases Are Absorbing Everything

Another pattern across this week’s stories: databases are taking on capabilities that used to require separate systems.

VectorChord indexed 100 million 768-dimensional vectors on PostgreSQL in 20 minutes using 16 vCPU and 12GB RAM. For comparison, pgvector needed ~40 hours and ~200GB for the same job. If you’re building semantic search or RAG into your data product, you may not need a separate vector database anymore.

pg_clickhouse is a new Postgres FDW that runs analytics queries on ClickHouse while presenting tables in a Postgres schema. Keep your OLTP in Postgres, push heavy analytics to ClickHouse, and query both through one interface. Useful for moving read-heavy workloads off your primary without changing your application code.

MotherDuck’s piece on Git for data explores branching datasets: clone production data, test transformations in isolation, discard or merge when ready. It requires storage-level versioning (lakeFS, Nessie, Dolt, or zero-copy clones) plus branch-aware orchestration. We’re not fully there yet, but the tooling is maturing.

For data product builders, the implication is fewer systems to integrate and operate. Postgres with the right extensions can handle OLTP, analytics pushdown, vector search, and JSON querying. That’s a lot of capability in one place.

Quickfire

IBM is acquiring Confluent for $31/share all-cash. The announcement says Confluent stays a distinct brand, but Kafka now sits alongside Red Hat and HashiCorp in IBM’s portfolio. If you’re on Confluent Cloud, review your contracts for pricing and SLA implications.

Object storage costs sneak up on AI workloads. A new entrant explains why: ~60% of AI dataset objects are under 512KB, so you’re paying per-request, not per-byte. S3 Express One Zone at 10k PUT/s runs ~$29k/month in request fees alone. Audit your cost breakdown if your feature store or model registry does lots of small writes.

Terraform CDK is EOL. HashiCorp sunset it December 10. Export via cdktf synth --hcl and migrate to standard Terraform.

A cautionary tale on public datasets. A developer got banned by Google for uploading an AI training dataset that unknowingly contained CSAM. He reported it to the authorities. Ban stuck anyway. If you’re working with public datasets, scan them before uploading to consumer cloud services.

What to Watch

The Agentic AI Foundation is the story to track. Protocol standards live or die on governance, and we haven’t seen the first major dispute yet. But the starting position—competitors agreeing before fragmentation—is better than most standards efforts get.

The simplicity trend is worth paying attention to. If your architecture diagram has a lot of boxes, ask whether each one is earning its operational cost. Sometimes a monolith, SQLite, or DuckDB is the right answer.

And keep an eye on your Postgres extensions. The ecosystem is absorbing capabilities fast. Vector search, analytics pushdown, JSON indexing—a lot of what used to require separate systems now fits in one place.

Discussion about this post

Ready for more?