The AGI Scientist
All guides
Reading pathThe AGI Scientist · June 2, 2026 · 7 min read

A reading path into alignment

A curated route from 'why is this hard?' to the open problems on the frontier — in a sensible order.

A reading path into alignment

Alignment has a large, scattered literature and no obvious front door. This is one sensible order to walk through it — enough to reach the open problems without getting lost on the way.

1. Why it's hard

Start with the framing: capable systems optimizing a specified objective can satisfy the letter of it while missing the intent. Get comfortable with specification gaming and goal misgeneralization before anything else.

2. What we can measure

Move to evaluation: how do we tell whether a system is aligned at all? This is where honest benchmarks and red-teaming come in — and where a lot of the real work is right now.

3. What we can inspect

Then interpretability: if we can read a model's internals, we can audit goals instead of guessing at them. Pair the primers here with the open probes in the research feed.

4. The open frontier

Finally, the unsolved parts — scalable oversight, corrigibility, multi-agent incentives. Bring your questions to the Safety & Alignment working group; the best reading path ends in a conversation.