By request. I probably should look up earlier statements before writing this, but I hope I am consistent. Consider an education intervention and a set of tests that it must pass. The intervention could be “more spending” or “method X used in the classroom” or “longer school days” or “charter schools” or what have you.
1. It should show a meaningful difference under experimental conditions, meaning that selection bias is eliminated.
2. The difference should persist, rather than fade out. If you show a difference in first grade but by third grade or fifth grade the experimental group is on on the same level as the control group, then there is fade-out.
3. The results should be replicated. One experiment that works one time does not count.
4. The intervention should be scalable. The intervention does not depend on a uniquely gifted teacher.
The Null Hypothesis is that no intervention passes all four tests.