Tag

Vellum AI

2 issues found

Apr 16, 2026

Description

Infrastructure Hits Standard The Model Context Protocol’s move to the Linux Foundation, backed by Shopify and Cloudflare, marks the industry’s transition from experimental tool-calling to a standardized "USB port" for agents.
The Planning Plateau New benchmarks like AgentBench 2.0 and AMD’s audit of Claude Code show a 25% performance drop in complex scenarios, highlighting a "20% success ceiling" that infrastructure alone cannot fix.
Code Over JSON Hugging Face’s pivot to Python-based execution in Transformers Agents 2.0 is outperforming traditional structured tool-calling, suggesting the future of agency lies in code-as-action.
Open-Source Parity The gap between closed and open models is evaporating as GLM-5.1 surpasses frontier models on SWE-Bench Pro, moving the competitive moat toward orchestration and environment design.

Description

The Desktop Pivot OpenClaw and Meta’s Manus are moving agents from browser wrappers to local system daemons, redefining the desktop as the primary runtime.
Infrastructure Hardening Anthropic’s MCP and OpenAI’s CUA API are standardizing data integration and computer use, signaling a shift toward enterprise-grade reliability.
Economic Disruption DeepSeek-V3’s massive cost advantage is forcing a pivot toward open-weights reasoning, while frameworks like PydanticAI bring type-safety to agent orchestration.
Beyond JSON The JSON wall is breaking as code-as-action and reasoning loops replace rigid templates to solve high failure rates in complex environments.