Tuesday, December 30, 2025
HomeTechnologyKubeCon + CloudNativeCon NA 2025 Recap – O’Reilly

KubeCon + CloudNativeCon NA 2025 Recap – O’Reilly

As to be anticipated, AI was all over the place at KubeCon + CloudNativeCon in Atlanta this yr—however the true vitality was targeted on one thing much less headline-grabbing and extra foundational: fixing on a regular basis operational challenges. Amid the thrill about clever techniques and futuristic workflows, practitioners remained grounded in pressing, sensible work—managing software sprawl, tackling Kubernetes complexity, and confronting the chaos of “day two” operations.

Operations Stays Human Centered

There’s actual promise in AI, particularly in areas like automation and observability. However many groups are nonetheless determining easy methods to combine AI into legacy techniques which can be already underneath stress. What stood out most was how human-centered the cloud native group stays—dedicated to decreasing toil, enhancing developer expertise, and constructing resilient platforms that work when the pager goes off at 3am.

A major instance of this grounded perspective got here from Adobe’s Joseph Sandoval. In his keynote, ”Most Acceleration: Cloud Native on the Velocity of AI,” Sandoval acknowledged the dramatic potential of AI-native infrastructure—however made clear it’s not only a tooling revolution. “We’ve entered the agent financial system,” he stated, describing techniques that may “observe, motive, and act.” However to help these workloads, we should evolve Kubernetes itself: “We’re shifting from tracing requests to tracing reasoning—from metrics to which means.” Kubernetes, he argued, has grow to be the inspiration for AI, if unintentionally, providing the flexibleness and management these techniques demand.

This potential is already seen in the true world: Niantic’s Pokémon GO group, for instance, demonstrated how they use Kubernetes and Kubeflow to run a world machine studying–powered scheduling platform that predicts participant participation and orchestrates in-game occasions throughout tens of millions of areas. However autonomy, Sandoval cautioned, solely works when it’s constructed on operational belief—smarter scheduling, adaptive orchestration, and rock-solid safety boundaries.

Andy Zhang at KubeCon 2025
Niantic’s Andy Zhang shares “Scaling Geo-Temporal ML: How Pokemon Go Optimizes International Gameplay With Kubernetes and Kubeflow” at KubeCon + CloudNativeCon NA 2025, November 11. Picture courtesy of the Cloud Native Computing Basis.

This name to bolster foundational infrastructure echoed throughout the occasion, particularly in platform engineering discussions. Abby Bangser’s keynote framed platform engineering not as yet one more revolution however as a response to complexity: “We construct platforms to scale back the complexity and scope for these constructing on prime, to not give them new techniques to be taught.” Nice platforms, she argued, are judged not by shiny structure diagrams however by how successfully they empower builders. Inner platforms grow to be an financial system of scale—bespoke to a enterprise but broadly enabling. And most significantly: “The one success is a more practical and happier improvement group.” (When you’re keen on going deeper, take a look at her report, Platform as a Product, coauthored with Daniel Bryant, Colin Humphreys, and Cat Morris.)

Formidable AI Requires Sensible Engineering

All through the convention, this emphasis on developer expertise and sensible operations persistently overshadowed AI hype. That context made the CNCF’s launch of Kubernetes AI Conformance really feel particularly well timed. “As AI strikes into manufacturing, groups want constant infrastructure they’ll depend on,” stated Chris Aniszczyk, CNCF’s CTO. The purpose is to create guardrails so AI workloads behave predictably throughout completely different environments. This maturity is already seen—KServe’s commencement to incubating standing is an indication that foundational work is progressively catching as much as AI ambition.

KubeCon 2025 registration
Registration at KubeCon + CloudNativeCon NA 2025, November 10. Picture courtesy of the Cloud Native Computing Basis.

In the meantime, the hallway conversations have been stuffed with a really actual and fast concern: the introduced retirement of Ingress NGINX, which at present runs in almost half of all Kubernetes clusters. Groups abruptly needed to reckon with crucial migration planning, a reminder that whereas we speak about constructing clever techniques of the long run, our operational actuality remains to be deeply rooted in managing very important however growing old parts at present.

There have been actually two converging tales being advised. Platform engineering talks targeted on hard-earned classes and production-hardened architectures. Audio system from Capital One, for instance, demonstrated how their inside platform, Dragon, developed from considerate iteration and real-world adaptation over time to a scalable, resilient platform. In the meantime, the complexities of the rising AI area have been highlighted in classes like “Navigating the AI/ML Networking Maze in Kubernetes: Classes from the Trenches,” which detailed how AI/ML workloads are pushing HPC networking ideas like RDMA and MPI into Kubernetes, making a “new studying curve” and discussing the “intricacies of integrating specialised {hardware}.”

The true intrigue is watching these worlds collide in actual time: platform engineers being requested to operationalize AI workloads they barely belief, and AI groups realizing their fashions require extra than simply compute—they nonetheless want to resolve issues like site visitors routing, identification, observability, and failure isolation.

The Ecosystem Continues to Mature

Because the ecosystem evolves, some clear frontrunners are rising. eBPF (particularly by way of Cilium) has grow to be the spine of contemporary networking and observability. Gateway API has matured into a robust next-generation various to Kubernetes Ingress, with broad help throughout fashionable ingress and repair mesh suppliers. OpenTelemetry is turning into the usual for accumulating indicators at scale. Dynamic Useful resource Allocation (DRA) and Mannequin Context Protocol (MCP) are two crucial Kubernetes API extensions clearly rising as key enablers for the brand new technology of AI-driven workloads. These aren’t simply instruments—they’re foundations for a future the place infrastructure have to be extra clever and extra manageable without delay.

Solutions showcase at KubeCon 2025
The Options Showcase exhibit corridor at KubeCon + CloudNativeCon NA 2025, November 11. Picture courtesy of the Cloud Native Computing Basis.

It’s becoming that the CNCF marked its tenth birthday at this KubeCon—10 years of evolving an ecosystem formed not by flashy developments however by constant, collaborative tooling that quietly powers at present’s most important platforms. With over 200 tasks underneath its umbrella, the inspiration now turns towards the AI-native future with the identical mindset: construct steady layers first, then empower innovation on prime. The trail ahead received’t come from yet one more algorithm, agent, or abstraction layer however from the much less glamorous, deeply necessary work: derisking complexity, stabilizing orchestration layers, and enabling the groups who dwell in manufacturing.

The groups slogging by way of ingress controller deprecations at present are constructing the belief wanted for tomorrow’s agent-native techniques. Earlier than we will hand over actual accountability to AI brokers, we want platforms resilient sufficient to include their failures—and versatile sufficient to allow their success. The subsequent occasion, KubeCon & Cloud NativeCon Europe, takes place in Amsterdam March 23–26 within the new yr, and we’re trying ahead to seeing extra classes that additional this dialog.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments