OpenAI detects accidental chain-of-thought grading in models, finds no monitorability loss

Comments

Join the discussion on this story.