agent brief/2025-11-29

Opus 4.5 takes the lead.

time to read4m
time saved140 min
sources529
λsynopses
Anthropic has aggressively redefined the agent landscape with the release of Opus 4.5, which now dominates benchmarks like SWE-Bench with an 87% success rate using sub-agents. Beyond raw performance, the model introduces a 3x cost reduction and persistent memory features, making long-horizon, autonomous engineering workflows commercially viable for the first time. Parallel to this, DeepSeek-Math-V2 is proving that architectural innovation rivals scale. By utilizing a generator-verifier loop and reinforcement learning, it achieved the first open-source Gold on the IMO, showcasing a reasoning pattern that is likely to become standard for reliable agentic thought processes. However, as capabilities scale, so do the attack vectors. Security expert Simon Willison issued a critical clarification this week distinguishing prompt injection from jailbreaking, noting that tool-using agents (such as those on MCP servers) face unique risks of data exfiltration that current guardrails cannot reliably stop. The industry is moving fast: agents are becoming smarter and cheaper, but the security layer remains dangerously thin.

#tags
system operational
end :: 529 signals processed