Against the null hypothesis

Maya Escueta and others write (AEA access required),

Two interventions in the United States stand out as being particularly promising—a fairly low-intensity online program that provides students with immediate feedback on math homework was found to have an effect size of 0.18 standard deviations, and a more intensive software-based math curriculum intervention improved seventh and eighth grade math scores by a remarkable 0.63 and 0.56 standard deviations.

This is from a survey article on the effectiveness of various forms of educational interventions that use technology.

11 thoughts on “Against the null hypothesis

  1. We know about the problem of looking for keys under the streetlight, because that’s where the light is, and we address it by also looking a little further away from the streetlight:

    The “file drawer” problem—the notion that studies with significant results are relatively more likely to be published, while studies showing null results tend to be filed away—presents a perennial challenge for literature reviews oriented toward impact evaluations. While no review can fully circumvent this challenge, we took steps to minimize its presence within this article. In particular, we chose not to exclude any studies based on publication status. Our final list thus consists of published academic articles, working papers, evaluation reports, and unpublished manuscripts. Furthermore, we conducted extensive consultations with leading researchers, evaluators, and practitioners in the field, asking each about every study that s/he was aware of in his or her area of specialization, whether or not the study was published or unpublished, and whether its findings were significant or null. The file drawer problem may extend beyond publication bias, in that papers may not even be written up if null results are detected. While we cannot entirely rule out this possibility, we believe we have taken all feasible steps to avoid it, and that our approach is certainly more effective in doing so than the modal literature review approach of solely performing keyword searches within databases that consist entirely of published articles.

  2. My version of the Null Hypothesis talks about lasting improvement. Assuming those numbers are correct, what is the time frame? Is it a day after the lesson? A week? A month? A year? High school graduation? My experience as a high school teacher was that even when students knew enough to do well on an end-of-unit test, they had forgotten a good deal within two months. My small experience with educational research was that “outcomes” were evaluated less than two months out.

    • The Collegiate Learning Assessment is kind of like that, but it is more like a general-purpose standardized test for college-level skills than one of retention of specific knowledge, which is partially why the results look exactly like a college’s published SAT score range for new matriculants.

      If you give college students a CLA test at the beginning, and another one four years later at the end, there is usually not much improvement.

    • What’s needed is an integration of papers like this and brain research. My experience is that I will be able to recall new knowledge that I continue to use. If I don’t use it, it seems to be filed in a sort of outline form somewhere in my brain. If it sits there for months or years it can be reactivated with renewed use. My hypothesis is that teachers remember the fine details of their subject matter precisely because they continue to teach it year after year or quarter after quarter. If someone were to get a degree in some area of education, and then become a plumber or an electrician, their knowledge would grow in the realms useful for those vocations, and the rest would recede.

      • Absolutely! Which is why it is ridiculous to assume that once a student passes a test (or a course) in a subject, she now knows the material. If she continues to use some of it, she will remember that. If not, the ed business term is “fade out”.

  3. Most literate parents have taken to heart the idea that if they want their child to read well, it is incumbent upon themselves to take the daily time to read with the child and practice at home. The null hypothesis doesn’t extend to parental involvement. But many parents have been put off of taking the same responsibility for maths. One wonders how widely known it is that even maths have been thoroughly politicized in public elementary classrooms. ( See: https://thefederalist.com/2018/08/30/professor-worked-common-core-tests-math-needs-downplay-objects-truth-knowledge/ ). The “cram school” model might be a good alternative for parents intimidated by the idea of teaching math. Based on educational outcomes in countries in which cram schools are widely popular, one would think the null hypothesis would be rejected in their favor. The long-standing slur against Asian education practice has been that it stifled creativity. Yet, compared to the stifling conformism of the woke environment in the USA, China seems a veritable oasis of tolerance and creativity. Even in areas like film where in distant memory the USA was dominant, China out performs with remarkably creative and inspiring films. Leap, for example (Prime included streaming) is a better sports movie than anything produced in the USA this century. In areas where such extracurricular schooling is not available, software might be a good alternative especially if it has been tested and found efficacious. The teachers unions may be successful in preventing any substantive learning from occurring in public schools, but they can’t stop parents from doing right by their children.

    • Cram schools are popular in the US and so is parental involvement in children’s math education, but only and consistently with one group — East Asian Americans (and maybe high caste Indians). Unsurprisingly, their children outperform the others with income only being a minor factor in overall outperformance.

  4. Software based math learning is so obviously correct, it is no surprise that it defeats the null hypothesis. Fast feedback, targeted practice at weak spots, and spaced repetition are well-known techniques for improving learning. All of those things can be done better by software than by humans. Instead of waiting for the end of the week for the quiz, or a day (at best!) for homework to be graded, it is instant. Software can personalize the problem to the learner that no teacher in front of a classroom of can. Even a tutor will run out of problems in a problem set. The computer program can make an infinite number of the same math problem. And spaced repetition is easy to program into a software curriculum.

    If you want to try it for yourself, Khan Academy is free, and has math from first grade through calculus, linear algebra, and statistics.

    • The big problem with software based math learning has been that most students won’t stay with it. Students who are interested in math or are self-starters do great with it. But the majority of students aren’t like that.

    • There is a belief among tech folks that the software learning should out perform anything else. It’s not that it is so good now, but that with software you can keep making improvements overtime. A/B testing different math problems, the order in which they are presented the metaphors used to describe concepts. all of this can be improved over time. You can even measure retention over time and use that as an outcome. Additionally, making things more compelling sticky is what we’ve been using a/b testing for in the last 20 years.

      I basically agree with this approach, however, having kids and making commercial software makes me understand the challenge a bit more.
      1. A software course that is compelling and optimized is a major undertaking. Think many years to develop, each action needs to be rewarded. Think animation and virtual prizes, there will be characters and voice actors, each screen needs to be designed. It would probably be best as being character based. Teaching your avatar how to read, etc…It’s the design of a major video, per grade, per subject, it will cost tens or hundred of million dollars to produce. This is the difference between a simple software app where lots of stuff is just repeated, same rewards, and something that out-performs humans, which will require constant novelty over the entire curriculum. Hundreds of artists, teachers and programmers will be required to build it.

      2. We can use other tricks to manipulate students into completing assignments, social software is the way to go, but effective and stressful maybe the same thing in this context. We can keep your kids tapping, and learning, but parents aren’t going to like how we manipulate them. We need to addict kids to the software, won’t be that hard, hire some facebook and twitter folks. the whole thing is a game, add a social media layer and we’ll beat human teachers.

      3. We’ll still need teachers to help students, but they can maybe now work more 1-on-1, software will have a teacher layer, giving them instructions on what to focus on with each student.

      4. We may never develop the software that works. Developing the learning stuff is easy, but it won’t perform without being compelling and entertaining, which is hard. So everyone will try doing the easy part first and never show enough benefit to bother with the hard part.

      The Kahn academy kids app is pretty good, for pre-school, I’m sure it cost more than 10 million to produce.

      Apps this big don’t have much competition, since they cost so much to develop. We should be okay with it, and have it run but Disney and Google.

  5. I’m optimistic. I believe online apps can teach today’s kids math better than the status quo.

Comments are closed.