Teaching Claude why~ai.alignment~ai.llms~researchanthropic> We use agentic misalignment as a case study to highlight some of the… morewww.anthropic.com 2 weeks agoTildes
Claude Mythos preview~ai.alignment~ai.llms~dev~securityanthropic> As we wrote in the Project Glasswing announcement, we do not plan to make… morered.anthropic.com Apr 7, 2026Tildes
Eval awareness in Claude Opus 4.6’s BrowseComp performance~ai.alignment~ai.benchmarksanthropic> Claude hadn’t yet discovered it was in BrowseComp, but it had correctly… morewww.anthropic.com Mar 7, 2026Tildes
Why we are excited about confessions~ai.alignment~research~tech> A deeper look at confessions, reward hacking, and monitoring in alignment… morealignment.openai.com Jan 15, 2026Tildes