All guides
GuideThe AGI Scientist · June 12, 2026 · 9 min read
How to run a reproducible experiment
A practical checklist for experiments others can actually re-run — the difference between a result and a rumor.

A result nobody can reproduce is a rumor with a chart. This guide is the checklist we hold our own work to before we publish.
Pin everything
- Environment. Lock dependency versions and record the hardware. "Latest" is not a version.
- Data. Snapshot the exact dataset and its preprocessing. Reference it by a content hash, not a filename.
- Seeds. Set and log every random seed. Report variance across seeds, not a single lucky run.
Make it one command
If reproducing your work takes more than a single command, most people won't. Ship a script that pulls the pinned data, runs the experiment, and emits the same numbers you're claiming.
Report honestly
State what didn't work, the failure modes, and the compute budget. A reproducible negative result is worth more than an unreproducible triumph.
Then publish it open
Put the code, the config, and the artifacts where the community can reach them — and hand the next researcher a higher starting point.