Doubts about teacher value added

Marianne Bitner and others write,

Using administrative data from New York City, we find estimated teacher “effects” on height that are comparable in magnitude to actual teacher effects on math and ELA achievement, 0.22: compared to 0.29: and 0.26: respectively. On its face, such results raise concerns about the validity of these models.

. . .our results provide a cautionary tale for the naïve application of VAMs to teacher
evaluation and other settings. They point to the possibility of the misidentification of sizable teacher
“effects” where none exist. These effects may be due in part to spurious variation driven by the typically
small samples of children used to estimate a teacher’s individual effect.

VAMs = value-added measures. Pointer from a reader. I note that some recent NBER working papers are now free downloads. Others are not. This one is.

Lest you miss the point, this paper shows that the same methods that purport to show an effect of teachers on student achievement also show an effect of teachers on student height. But the effect of teachers on height is almost surely spurious. So the effect of teachers on achievement may also be spurious.

1. This provides vindication for Jerry Muller’s The Tyranny of Metrics.

2. It provides support for the Null Hypothesis.

3. The research that seemed to show a big effect of teachers (e.g., Raj Chetty on kindergarten teachers) got a lot of play in the press. But that had social desirability bias going for it. I would be surprised if this paper receives similar notice.

12 thoughts on “Doubts about teacher value added

  1. My guess this would describe 90% of workers in the US and there is probably something to Tyler Cowen’s Average Is Over. I told my one of my kids in High School, you can’t blame the bad teacher for getting a bad grade. It is an important lesson that you have work a lot people don’t like and still perform.

    That said I still think if conservatives want to reorganize our education system, they should include teachers as stakeholders as I don’t think public education is going away anytime soon. (I still say one of the biggest hurdles to for more private schools and support for vouchers, is the simple problem of daily transportation.)

    • Regarding daily transportation:

      – Most public school starts at Kindergarten. Most of the market for day care for younger children is privately run schools. Of course, parents shop for a day care option by location and convenience. That’s not an insurmountable problem.

      – Where I live, in Austin, Texas, the city assigns each student a “home school”, but allows parents to apply for transfers to other public schools within the system. A big caveat, is with transfers, the family is responsible for daily transportation and does not get bus service to the home. Doing some quick searches for stats, there are 84.5k students in the public school system district, only 23k use the official school busing service. I can’t see public stats on how many students choose a transfer school vs home school, but clearly, many families are already willing to provide their own transportation for more choice of school.

  2. Mankiw study on taxation by height or income being equivalent. I remember it, posted right here on this blog years ago.
    What is it about tall people doing better? They can see farther? I haven’t looked at the cause, have no ready answer. Intelligent women like tall men? Tall people can see farther?

    • Note that the authors of the paper give reasons to reject the hypothesis that their results are due to some factor connecting height to achievement.

    • Height is associated with childhood nutrition, so it may act as a surrogate for parental resources or quality of parenting. In this particular case, I interpret the finding as follows. If you have teacher data for only a year or two, there will be some teachers who by chance had students with better parents. These teachers will measure greater value-added. But because value-added seems to be correlated with height, perhaps the teachers’ high value-added wasn’t due to their teaching quality, but rather that they had more apt students. This statistical problem should fix itself with more years of data due to the law of large numbers.

  3. I’ve long thought that the more pleasing a research finding, the less one should have faith in it. However, this doesn’t imply that displeasing findings should be very credible, since they are often “shocking” findings and there is a niche for such work.

    Maybe your Null Hypothesis generalizes to other areas of social science? Basically, that no treatment effect in social science is sizable, robust and long-lasting. (E.g., Jeffrey Sachs’s Millennium villages to name one.)

  4. I believe there are some great teachers who can, and do, have a big, positive effect on a few students.
    Sometimes there is a critical mass of such students in one class so that the great teacher has a big positive effect on most of the kids. Since these kids had been underperforming for many years, due to lousy teachers, and parents, and peer students. (Maybe good or lousy peer students dominates — thus getting into a “good” school is crucially important to have good peers. Probably).

    Most of the time, the “great” teacher doesn’t reach such a critical mass in a poor school, and in already good schools, there is not as much unrealized potential.

    There is no “great education” possible for below avg IQ students, tho such people can, and should, be taught better life skills (in school as well as in home), plus have opportunities to learn how to work with their hands in some trade and/or craft.

  5. At minimum, mostly fake news.

    Read or skim the paper.

    VAM is supposed to use shrinkage across multiple years. That’s the proper technique.

    The authors acknowledge very late in the paper:

    “When we apply the shrinkage across multiple years, the teacher effect on height goes away.”

    To summarize their paper:

    1. When we followed the recommended approach, we found zero height relationship. ZERO.

    2. But that doesn’t make a publishable paper.

    3. So we used a non-recommended approach, justifying it by saying sometimes other dumb people use it.

    4. Result: publishable paper!

    • Yeah, this. Except it is true that lots of people (not only dumb ones) talk about the full dispersion of teacher effects from a single year as though it’s a good measure of how important teachers are–clearly that’s overly optimistic (not shrunken, as you point out), and the paper is creative debunking of that idea.

  6. Thanks MG for flagging that.

    That said even well-designed measures used to manage state-run k12 Monopolies have not had impact at scale over time (DCPS is small and not what I could call scale). The reasons why are apparent –
    * These K12 monopolies are politically run (local board or mayoral elections where employees elect randos to “govern the school board”) leaving goals and strategy to the whims of whoever can get out the most votes for off-cycle elections.
    * They also don’t have the most important type of feedback needed to improve (parents taking per student funding and voluntarily walking away or staying).

    Could be wrong but I suspect these types of measures will be embraced in schools of choice once they get to large scale (100s and 1000s of schools).

    • Perhaps. We used a version of this approach at Bridge (500 schools when I left, now 900). It was just to help identify the tail — teachers who seemed like lowest 1% based on negative student growth. Then we’d cross check with in-person observations.

      One opportunity to improve this tool is one test every trimester instead of every year. But obvious trade-off on “too much time spent testing.”

Comments are closed.