~ai.alignment - Search

~ai.alignment×

4 links

Teaching Claude why

~ai.alignment ~ai.llms ~research anthropic

> We use agentic misalignment as a case study to highlight some of the… more

www.anthropic.com May 9, 2026 Tildes

Claude Mythos preview

~ai.alignment ~ai.llms ~dev ~security anthropic

> As we wrote in the Project Glasswing announcement, we do not plan to make… more

red.anthropic.com Apr 7, 2026 Tildes

Eval awareness in Claude Opus 4.6’s BrowseComp performance

~ai.alignment ~ai.benchmarks anthropic

> Claude hadn’t yet discovered it was in BrowseComp, but it had correctly… more

www.anthropic.com Mar 7, 2026 Tildes

Why we are excited about confessions

~ai.alignment ~research ~tech

> A deeper look at confessions, reward hacking, and monitoring in alignment… more

alignment.openai.com Jan 15, 2026 Tildes