DAILY BRIEFING · SATURDAY, MAY 23, 2026
Agentic AI moves from the edges to the core of the platform — multi-agent protocols land in streaming, runtime guardrails reach GA in the warehouse, and observability vendors begin executing remediation without a human in the loop.
⇣ Jump To
Streaming & Messaging · CDC · Stream Processing · Transformation Frameworks · In-Process Compute
Cloud Data Warehouses · Lakehouses · Table Formats · Architectural Patterns · Specialty Platforms
AI-Driven Consumption · Semantic Layers & Retrieval · Enterprise RAG & Retrieval
Orchestration & Workflow · Data Observability · Data Quality & Testing · Catalogs & Metadata · Governance, Security & Compliance
⚡ QUICK TAKES
| Story | Signal |
|---|---|
| ↗ Confluent ships A2A protocol for multi-agent networks | The streaming bus becomes the agent bus. |
| ↗ Redpanda posts 70% ARR growth on agentic data plane | Kafka-compat rebrands as AI substrate, with results. |
| ↗ Estuary: production Debezium is "far from set-and-forget" | CDC operational tax becomes the buying criterion. |
| ↗ RisingWave claims 22 of 27 Nexmark wins over Flink | Streaming database model challenges DAG orthodoxy. |
| ↗ dbt Labs ships new Semantic Layer YAML, Fusion in Core 1.12 | Semantic models collapse into model YAML entries. |
| ↗ DuckDB 1.5.2 lands DuckLake; Polars + DuckDB stack matures | In-process is now production for medium-scale analytics. |
| ↗ Snowflake Batch Cortex Search GA on May 18 | Millions of fuzzy lookups in one SQL statement. |
| ↗ Snowflake adds column-level lineage to dbt DAG view | Horizon Catalog wires lineage into the dev loop. |
| ↗ Databricks Native Lakehouse Sync goes preview on Autoscaling | Postgres WAL writes directly to Delta — zero pipeline. |
| ↗ Apache Iceberg v3 GA on Snowflake (May 7) | VARIANT, deletion vectors, row lineage in production. |
| ↗ Open lakehouse "no longer experimental" in 2026 | Federated catalogs like Polaris reshape the pattern. |
| ↗ Teradata launches Autonomous Knowledge Platform | Sovereign-AI play with Dell/NVIDIA on-prem option. |
| ↗ Cortex AI Guardrails GA for Snowflake Intelligence | Prompt-injection defense at the warehouse boundary. |
| ↗ AtScale Semantic Layer Summit puts agentic analytics center-stage | 8,500+ practitioners; OSI and AI take the keynote. |
| ↗ Context architecture replaces RAG for agent workloads | Agents pull data at runtime, not as pre-loaded payload. |
| ↗ Astronomer + IBM OEM Airflow for regulated industries | 70% downtime cut, claims joint launch material. |
| ↗ Acceldata's Autonomous Data & AI Platform reaches GA | Agents detect, diagnose, and remediate without alerts. |
| ↗ CTERA InsightAI brings agentic management to unstructured data | Audit logs + permissions analyzed in natural language. |
| ↗ Soda 4.0 fuses observability with a contracts engine | Contracts + anomaly checks ship in one open-source core. |
| ↗ OpenMetadata 1.12.8 hardens Unity Catalog & Iceberg ingest | CVE patches and Iceberg property surfacing land May 13. |
| ↗ Informatica ties Iceberg governance into Snowflake | "Build once, deploy anywhere" row-level policy expands. |
TechTarget · May 2026
Confluent Intelligence now supports Google's Agent2Agent (A2A) protocol in open preview, letting Streaming Agents orchestrate task-handoffs across heterogeneous agent frameworks over Kafka topics. Paired with Multivariate Anomaly Detection, the move positions the streaming bus as the orchestration spine for multi-agent enterprise networks — and gives platform teams a single place to govern, secure, and observe agent traffic alongside data traffic.
✍️ TechTarget · Read article →
Yahoo Finance / Redpanda · May 2026
Redpanda's FY27 Q1 numbers show 70% ARR growth, with the company crediting demand for "data and governance infrastructure needed to deploy AI agents safely at scale." Kafka compatibility has been deliberately backgrounded in favor of the Agentic Data Plane (MCP, A2A, AI Gateway, OIDC-based identity, OpenTelemetry traces) — a signal that the streaming vendor positioning fight is now over agent governance, not throughput.
✍️ Redpanda via Yahoo Finance · Read article →
Estuary · April 2026
Estuary's field-report takedown catalogs the operational tax of running Debezium at scale: Kafka Connect lifecycle, replication-slot management, snapshot semantics, schema-registry coordination, and the lack of native idempotent sinks. The piece is pointed marketing, but the failure modes it lists are the same ones platform teams hit in week 12 of a Postgres-to-Snowflake build — and they're shaping the buying conversation around CDC managed services.
✍️ Estuary · Read article →
RisingWave · May 2026
The streaming-database vendor claims RisingWave outperforms Flink on 22 of 27 Nexmark queries, with the gap widest on multi-stream joins (10+ inputs) where Flink's RocksDB-backed state management struggles. Beyond the benchmark argument, the piece frames a real architectural choice for platform teams: PostgreSQL-compatible streaming database with built-in storage versus distributed DAG framework — and what each implies for ops, replay, and integration with downstream lakehouses.
✍️ RisingWave · Read article →
dbt Labs · May 2026
The May release introduces a new Semantic Layer YAML spec that embeds semantic models inside model YAML entries, promotes measures to simple metrics, and lifts frequently-used options to top-level keys; the spec ships in dbt Core v1.12 and on the platform "Latest" track. With the Fivetran merger still pending close, the rapid spec evolution and Fusion engine availability through dbt Projects on Snowflake suggest dbt is locking in semantic-layer relevance before vendor consolidation lands.
✍️ dbt Labs · Read article →
PyInns · 2026
DuckDB 1.5.2 (April 2026) brings the DuckLake extension to production and continues to harden SQL-first workflows, while Polars wins on transformation-heavy, programmatic logic. The two engines now share Arrow-native memory with zero-copy handoff, making "DuckDB for relational, Polars for dataframe" a defensible production split for medium-scale jobs that used to need Spark.
✍️ PyInns · Read article →
Snowflake Docs · May 2026
The new CORTEX_SEARCH_BATCH table function executes millions of fuzzy-match queries against a Cortex Search Service in a single SQL statement, with separate batch compute that doesn't degrade interactive serving. Snowflake is targeting entity resolution, deduplication, and clustering workloads that previously required external pipelines — and crucially, the service can be queried in batch and interactive mode concurrently, with batch jobs able to hit suspended serving instances.
✍️ Snowflake · Read article →
Snowflake Docs · May 2026
The May 19 release renders each dbt DAG node with its columns sourced from Horizon Catalog; selecting a column highlights every upstream and downstream model touching it. dbt Projects on Snowflake also now supports the dbt Fusion engine at no extra license cost. For platform teams, lineage moves from an after-the-fact catalog problem to a development-time view inside the editor.
✍️ Snowflake · Read article →
Databricks · May 2026
Lakehouse Sync decodes Lakebase's Postgres write-ahead log and writes directly to Unity Catalog managed Delta tables as SCD Type-2 history — no Kafka Connect, no Debezium, no external compute. Schema-level toggle, claimed zero impact on Postgres, and no extra cost. If it holds up, this is the simplest OLTP-to-lakehouse CDC path on the market, and a direct shot at the standalone CDC vendor stack.
✍️ Databricks · Read article →
Snowflake Docs · May 2026
With v3 GA Snowflake gets VARIANT for semi-structured payloads in a relational table, deletion vectors for faster CDC, row-level lineage, default values, and richer types — most of which already align with Delta v4. The story for architects is convergence: the same physical Iceberg table now exposes the AI-shaped surface (VARIANT, lineage) that previously required separate JSON columns or external doc stores.
✍️ Snowflake · Read article →
Architecture & Governance Magazine · May 2026
The piece argues the open lakehouse has moved from "emerging" to "default enterprise pattern" in 2026, driven by federated catalogs (Apache Polaris graduated in February), open table-format convergence, and the practical demand for engine-swap agility under agentic AI workloads. The argument that resonates for architects: vendor lock at the storage layer becomes intolerable when AI workloads demand multiple compute engines hitting the same physical files.
✍️ Architecture & Governance Magazine · Read article →
Teradata · May 2026
Announced May 7, the platform bundles Teradata AI Studio, an enhanced Teradata Cloud with Elastic Compute, and an on-premises "Factory" variant on Dell PowerEdge + NVIDIA AI Enterprise for regulated workloads. The pitch is sovereign agentic AI without re-platforming — interesting positioning for an incumbent fighting to stay relevant against Snowflake and Databricks now that data residency is a first-order AI requirement.
✍️ Teradata · Read article →
Snowflake Docs · May 2026
Guardrails — part of Horizon Catalog — now provide runtime detection of prompt injection and jailbreak attempts across Snowflake Intelligence, Cortex Agents, and Cortex Code, including indirect injection embedded in tool calls. Admins flip a single account-level AI_SETTINGS parameter to enable across surfaces. For platform teams, this pushes the AI-safety control plane down into the warehouse rather than the application layer.
✍️ Snowflake · Read article →
AtScale · May 2026
AtScale's May 20 summit drew 8,500+ practitioners with Vodafone, TELUS, Carrefour, Papa Johns, Blue Yonder, and SlickDeals speaking on the architectural foundations needed to scale AI in production. The recurring framing — semantic layer as the governed interface between agents and data — is now industry consensus among governance-aware platform teams, even as the Open Semantic Interchange standard tries to align dbt, Cube, AtScale and warehouse-native semantics.
✍️ AtScale · Read article →
VentureBeat · May 2026
VB Pulse's Q1 2026 tracker shows retrieval optimization spending overtook evaluation for the first time, with enterprises pivoting to runtime tool-call retrieval as agents make orders of magnitude more requests than human users. Redis Iris is the headline reference example — semantic-interface auto-generates MCP tools from data models, with 99% of memory on Flex flash at a tenth of in-memory cost. The architectural takeaway for data-platform teams: the retrieval layer is becoming a real-time service tier, not a pre-loaded index.
✍️ VentureBeat · Read article →
Astronomer / IBM · May 2026
"Astronomer with IBM" packages Astronomer's managed Airflow as a client-hosted OEM product for regulated industries, with claimed 70% reduction in data downtime and 20% faster pipeline build/test. For enterprises stuck on self-managed Airflow because Cloud Composer or MWAA doesn't meet residency requirements, this is a credible third path — and a sign IBM is buying its way back into the modern data stack via partnerships rather than acquisition.
✍️ Astronomer · Read article →
BigDATAwire / Acceldata · May 2026
GA worldwide May 19. The platform shifts the observability narrative from "alert humans" to "agents detect, diagnose, and remediate" across five domains — quality, pipelines, infrastructure, usage, and cost. Acceldata's "xLake" framing positions compute as governed-and-portable rather than tied to a single warehouse; whether the autonomous-remediation claim survives contact with regulated change-management practices is the question for platform leads.
✍️ Acceldata via BigDATAwire · Read article →
Help Net Security / CTERA · May 2026
Announced May 20, InsightAI correlates audit trails, metadata, permissions, capacity trends, and security events across CTERA's unstructured data platform, with an "Ask InsightAI" natural-language assistant for investigation. Deployable as SaaS, in a customer-managed VPC, or fully air-gapped (including AWS GovCloud / Azure Government). Targets compliance reporting, storage cost optimization, and chargeback for the file-data side of the estate.
✍️ Help Net Security · Read article →
Soda · 2026
Soda 4.0 unifies observability with a new open-source data contracts engine (Soda Core 4.0) and adds always-on anomaly detection that runs even without manually authored checks. The bet — also visible at Gable and Datafold — is that the contracts-versus-observability split was always a tooling artifact and that platform teams want a single layer that enforces upstream agreements and detects unanticipated drift downstream.
✍️ Soda · Read article →
OpenMetadata · May 2026
The May 13 maintenance release patches recently disclosed CVEs, eliminates hotspots in the search and tag pipelines, and improves connector behavior across Databricks, Unity Catalog, Athena, Datalake, and OpenLineage. Notable: Unity Catalog no longer hard-fails on missing httpPath, Iceberg-on-Athena tables now ingest properties from $properties metatables, and PostgreSQL/MSSQL connectors gain mTLS support — a quiet but important release for production users.
✍️ OpenMetadata · Read article →
Informatica · May 2026
Announced May 20 at Informatica World, the release extends Cloud Data Access Management (CDAM) row-level policy enforcement to Snowflake Iceberg tables under a "build once, deploy anywhere" model. Following its parallel Databricks Lakebase integration earlier this week, Informatica is positioning CDAM as the cross-platform governance plane that floats above whichever lakehouse the workload happens to land on — a credible answer to multi-engine reality.
✍️ Informatica · Read article →