The Null Hypothesis in Education is Hard to Disprove

Tyler Cowen reports on a study that shows no difference between students taught traditionally and students taught with a blended-learning approach (combining on-line and in-person teaching). Tyler entitles his post The Hybrid Educational Model Works. I would only have said that if the blended-learning approach were shown to be better.

In education, the null hypothesis is that nothing makes a long-term, scalable, replicable difference. That is:

1. Take any pedagogical innovation or educational intervention.

2. Subject it to a controlled experiment.

3. Evaluate the experiment’s outcome several years later.

4. If the experiment works, attempt to replicate the experiment in more situations.

By the time you reach step 4, if not sooner, you will be unable to show that the innovation makes any difference in outcomes. What this suggests to me is that in the long run it is the characteristics of the students that determine outcomes, at least on average. Think of an individual student as “predestined” to reach a certain outcome. An educational intervention can disturb their path to the predestined outcome but will not change the outcome. I do not literally believe this model, but it is a null hypothesis that is difficult to disprove.

8 thoughts on “The Null Hypothesis in Education is Hard to Disprove

  1. That’s why I find it odd that Bloom’s 2-Sigma “Problem” hasn’t received more attention:

    Bloom found that the average student tutored one-to-one using mastery learning techniques performed two standard deviations better than students who learn via conventional instructional methods — that is, “the average tutored student was above 98% of the students in the control class”. Additionally, the variation of the students’ achievement changed: “about 90% of the tutored students… attained the level of summative achievement reached by only the highest 20%” of the control class.

    • That is interesting. Though one’s first assumption is that on-to one tutoring would be too expensive but that depends on the amount of time the tutoring takes.

      Interestingly this is the approach many take to learning when a credential is not involved. An example would be learning to play an instrument. In the case of learning to play an instrument it is not expensive because the tutoring time is low.

  2. If hybrid education can get the same results for cheaper how is that not an “it works” result?

  3. Step 4 is usually the kicker. Lots of things work if you have a really smart education researcher running the experiment, or if you restrict the experiment to super students who are eager to learn a lot on their own. However, very few things work once you try to apply them more widely.

    One exception is time on task: if you have students work longer studying a particular subject, they learn more about that particular subject. No matter the teacher, no matter the student, no matter the method. Such exceptions prove the rule, though.

    Arguably, this is an argument for letting people be educated in different ways. If you talk to top-tier STEM students, they often express frustration about their earlier education holding them back so that they can sit in a nice tidy rectangular row of desks with 30 other students and crawl through a state-mandated curricilum. For such students, letting them spend an hour in the library would be an improvement for them, and hence an improvement for us all.

  4. There’s an additional confounding effect raised a bit by the debates on “home schooling”…

    ALL children from something resembling a household are to do some degree home schooled. Well or poorly, lots of effort or none, etc.

    In the modern era, almost all children will have some exposure to various web resources. So a contemporaneous comparison of “100% traditional” versus “home schooled” vs “not home schooled” vs “hybrid web” is practically impossible – there are probably no 100% traditional students left, if any ever existed in the first place.

  5. An education intervention that makes a difference:

    1. Teach students to play the violin.
    2. Compare the violin playing abilities of students who have taken violin lessons for five years against students who have not taken violin lessons for five years.
    3. Evaluate the outcomes several years later: Do those who took violin lessons for five years play the violin with greater proficiency than those students who were never trained on the violin?
    4. Replicate: Compare similar cohorts in different regions, nations, etc. Does one typically find that students who have spent five years learning to play the violin (where “learning to play the violin” consists, say, of five hours or more per week) play the violin better than those who never had such training?

    I would expect that no one would be the least bit surprised to find that the null hypothesis would be disproved in such a situation. Moreover, we could trade “chess” for “violin” and most would still accept that the null hypothesis would be disproved. Or even “reading.” Or “speaking Spanish.” Etc.

    No one doubts that if one compares one group that receives significant practice in an activity against another group with no exposure to the activity at all, that a treatment effect exists.

    Why then are so many people skeptical that interventions in education make a difference? Largely because the comparisons exist between idiotic variations within a government-dominated industry. It is especially bizarre to me that libertarians fail to understand this fact. Just because the comrades in Vladivostock were not able to replicate the results of the comrades in Moscow does not imply that effective innovations do not exist.

    Consider the fate of pedagogical innovation A:

    1. A is successful somewhere.
    2. The “controlled experiment” typically consists of either professors and/or bureaucrats attempting to replicate what A did somewhere else: “In second location, children were placed in classrooms with violins for 300 minutes per week just as A had done.” Do we know that the teacher in this classroom even knew how to play the violin?
    3. Evaluate the outcome later: One finds an effect if and only if the teacher(s) in the other classrooms were, in fact, capable of teaching violin playing. The fact that they had children hold the violins in a certain position for a certain number of minutes per week is irrelevant.
    4. If it works, replicate in more situations: Again, even if we happened to have actual violin teachers in the original set of controlled experiments, do we have actual violin teachers in the alleged replications?

    For those who claim that the null hypothesis in education is hard to disprove, how do they explain a wide variety of skills, such as violin playing, where people who have taken several years of lessons certainly seem to be able to play better than the rest of us do? What of foreign language instruction? Might people with five years of training in learning to speak a language be better at it than one who has never been exposed to the language?

    Clearly some kinds of instruction have a non-trivial impact in some situations. And yet the government-academic-bureaucratic-foundation blob has been unable to raise the test scores of certain cohorts of children in any consistent manner. Is this because somehow education in math and reading is fundamentally different from education in violin playing and Spanish? Or might it have more to do with the nature of a socialist education system?

    Moreover, even private schools in a socialist system may not have access to adequately trained teachers. If most schools had to hire government licensed “violin teachers,” and one acquired such a license not by learning to play the violin, but rather by obtaining credits from education schools, then over time “violin teachers” would come to signify those certified professionals, licensed by the state, with academic degrees, who may not actually be able to play, less alone teach, the violin at all. In such a universe, one would indeed find that there was no treatment effect from “being taught the violin” because the words “being taught the violin” had become at that point a charade.

    In such a universe, varying the number of minutes of instruction, the manner in which the bow is held, different techniques of violin instruction, etc. would all show no impact. Precisely because every once in a while a “violin teacher” was, in fact, someone who actually knew how to play the violin, occasionally there would be a great deal of excitement: Look! This innovation works! And then just as Arnold notes, as attempts to replicate the exceptional results in our world fail, so would they in this hypothetical world. “We evaluated forty classrooms in six districts in which violin teachers taught for the recommended 58 minutes each day using the bow technique recommended by A, but no measurable improvements were detected.” Duh.

    Note that with respect to violin playing and Spanish fluency there exist exogenous standards that transcend the nonsense of the academy. A certified Spanish teacher probably does know some Spanish. But a certified math teacher may be someone who passed mostly education courses in college, along with a few of the easiest math courses. Such a person may not be able to “think mathematically” in any sense that would be respected by mathematicians (mathematicians usually despise math teachers). And yet most Spanish teachers probably do know a bit of Spanish – in part because there are native Spanish speakers who provide a credible and immediately available external benchmark of fluency (as opposed to the faux “violin teachers” in my example above).

    A quick google finds a literature that suggests that native speakers tend to be more effective than non-native speakers in teaching a language:

    “The current study produced findings that are in accordance with previous studies in the field of teaching English as a foreign language, such as the dominance of the Native Speaker Model (Amin, 2001; Braine, 1999; Brutt-Griffler & Samimy, 1999; Canagarajah, 1999; Medgyes, 1994), the detrimental consequences of non-native teachers’ lack of confidence in their proficiency (Medgyes,1999; Pavlenko, 2003; Varghese et al., 2005), the deterioration of language proficiencywhen teachers are not regularly involved in the teaching of advanced level classes (Armour, 2004), and the importance of collaboration between native and non-native language teachers (Kamhi-Stein, 1999; Lazaraton, 2003; Pessoa & Sacchi, 2002).”

    Imagine that!

    For more on the conditions necessary in order to observe a treatment effect, see my posts at Bleeding Heart Libertarians:

  6. I thought that direct instruction had beat the null hypothesis. That is in comparison to our current public schools.


    Off the bat it is cheaper. And that in and of itself is enough. Even if you can’t raise the peeps up at a macro level, you CAN reduce the burden or having this social compulsion in the first place.

    But Arnold, you can take this to the bank – iteration works.

    What current studies of MOOC, don’t yet capture is:

    1. Evergreen – this is the cheaper argument – we don’t need the community college professor to repeat the same words over and over when the very best teacher in the state can say it once to a camera and beat the community college professor with 70%+ of his students.

    2. Iteration – this is the big one. There are 62 kinds of Americans, and less of them going to college, and there are three types of learners. And the Internet is proof, the final winning argument, that just as web page analytics have gotten better YOY, evergreen MOOC content will too.

    Sooner than you think we will KNOW that 27% of Hispanics these zip codes tended to stop a video at 15:45 and they stepped back the timecode and rewatched, and then they clicked out, and then they searched the key terms “tax wedge”

    And the job of the rock star professor isn’t to repeat the same words, his job is to iterate / invent a way of moving those learners into a description that fits their personae.

    it’ll capture 40% of the target fix, and then we iterate again.

    At a video timecode level, this is essentially creating a multi-stream choose your own adventure, where the story is the SAME, but the variations are real, but they are only to tell the same story to different kinds of people.

    Thats whats coming.