Posted 4 weeks ago
> A deeper look at confessions, reward hacking, and monitoring in alignment research.