Metrics meet the Null Hypothesis

From a podcast with Russ Roberts and Jerry Muller:

what’s so striking when you read through a lot of this literature on pay-for-performance and standardized measurement combined with pay-for-performance is: How often the scholarly literature shows, in a variety of fields, that it doesn’t work. And yet, politicians, policy-makers, they don’t seem to get the message.

People who are determined to try central planning aren’t interested in theories or evidence that indicate that central planning does not solve the problem.

It occurs to me that among the many problems with metrics in health care or education is that often the best way to look good is to be very selective about your customer base. Schools with children of affluent, two-parent households will tend to look “good.” Doctors who see mostly-healthy, conscientious patients will look “good.” etc.

The whole interview is interesting. Also, people seemed to like my essay on Jerry’s book.

12 thoughts on “Metrics meet the Null Hypothesis

  1. “And yet, politicians, policy-makers, they don’t seem to get the message.”

    This pre-supposes that a plausible view that is contrary to the politician’s existing message is considered important. I see little evidence that supports this idea. Politicians, on average, tend to see what makes them and their party electable. Whether or not there was ever a “good old days” when actual facts matter we are certainly not in them now.

  2. It cannot be said too often, “Good schools do not make successful students; Successful students make good schools.”

    If you moved the students of Bridgewater State up to Harvard and the students of Harvard College down to Bridgewater State, who do you think would be more successful in twenty years? I know how I’d bet.

  3. Let’s say we’re looking at a situation in which the metrics are bad. How do we decide between these two interpretations: 1) These particular metrics are bad, but others might be good, and 2) Metrics in general are a bad idea and approach to this particular matter? How can we tell whether someone who doesn’t want measurements for some other reason is abusing evidence for 1 in favor of an argument for 2?

    When we’re talking about education metrics for the general public, the problem is as obvious as it is politically umentionable: kids have very different levels of ability in ways that show typical statistical disparities between population groups, and so it’s completely absurd to have a one-size-fits-all approach to their education or a single absolute standard for measuring their progress. Instead, we could do our best to assess each kid’s expected potential and teaching them accordingly. If we are going to measure teachers and schools, we should be looking for whether the progress of those kids are consistently exceeding or falling short of those individualized expectations. Teachers don’t deserve special credit for a school full of smart kids testing smart, or deserve blame for a school full of slower kids testing slow. They also don’t deserve blame for kids who score lower because they can’t speak the language, or because there are disciplinary problems in the classes that the teachers and schools are effectively legally prohibited from addressing.

    But the obvious problems with stupid and unfair assessment systems weighs completely against Miller’s point about the problem: “… when testing becomes counterproductive and pernicious is when it’s connected to reward and punishment.”

    Nah. It’s bad when the rewards and punishments are stupid and unfair. That doesn’t mean there aren’t smarter and fairer ways to do it – though perhaps it’s valid to argue that since we’ll never get anywhere near that point in our broken ideological culture, one might as well not quibble. Because of what is unmentionable, a lot of this discussion has to take place on a quasi-Straussian level. If when Miller says “we shouldn’t do testing” he’s understood by people reading between the lines to mean “the stupid and unfair testing we can’t criticize and must all unfortunately accept as a political given” then maybe it’s defensible, but it seem to me he’s not doing that.

    At one point in the discussion about Caplan (who I expect to defend himself quickly and with typically devastating force) Miller talks about unmeasured engagements inspiring students to think “dialogically and dialectically”. Well, ok. But in my many years of schooling, the correlation between test scores and students best able to demonstrate that kind of thinking was extremely strong.

    Indeed, the levels of practically all the positive and desirable intangible influences Roberts and Miller believe happen in the educational process, and which I could observe in my classmates, seemed to correlate strongly with those scores. At any rate, it is a little bit ridiculous to expect that the experiences of professors with students several standard deviations above the population mean and attending top tier institutions could somehow be generalized to what umeasurable things could theoretically be going on in the mass education system.

  4. Have each employee interview elsewhere, say every two years, and bid .

    Yuk, yuk. But the other idea was a social network with intelligent and secure,m anonymous resume tools. Employees and hiring manages exchange enough info to decide on personal interview. Up to interview, the secure bot can produce anonymous distribution of information trends. The bot does no more.

  5. Funny how the discussion on education metrics and Caplan pretty much recapitulated your blog post on the topic from back in February.

    Contrasting the misuse of military statistics in the Vietnam era that you described in your essay with current day DOD metrics, I would argue that DOD has only grown better at subversion. Take, for example, their most recent annual Government Performance and Results Act (GPRA) report (http://dcmo.defense.gov/Publications/Annual-Performance-Plan-and-Performance-Report/ ). Page 10 of the 2017 report states:

    “At the end of the fourth quarter in FY 2017, the Department met or exceeded 48 percent of its performance targets. The Department had not met 53 percent of its targets.”

    Honest and objective reporting? Complete indifference to the perception of being a poor performer? Exceptionally challenging targets? Or strategic budgetary gamesmanship? They got the money so I am betting the latter. The iron-clad rule of appropriations is that they reward failure, so if you are not failing, create the illusion that you are. And we will never know if the extra money makes a bit of difference or if funds could have been redirected within the base to address higher priorities.

    Jerry Elig who used to be at Mercatus wrote a lot about GPRA back about 8 years ago. It would be interesting to have the two Jerry’s discuss GPRA.

    Not that GPRA will ever be repealed even if is conclusively demonstrated to be a waste. There is too much of an industry built up around preparing and publishing this stuff.

    Yet, one wonders how far judgment will actually get anyone in achieving necessary reforms to the administrative state. Probably not very far. We will simply have to wait for Venezualan-style rock bottom before we get an upward turn. This might suggest that humility and restraint are strongly indicated. And, yet again, makes the forlorn and defeated case for limited government.

    At any rate, a fine publice service rendered by your diligence in keeping this pot stirred.

    • True, but when they continue to do so “regardless” in the type of pursuits that real market fundamentalists tend to partake, a nasty metric called “P/L” begins to suggest to them they they are wrong. I would like to know how non-market fundamentalists challenge their biases…

      • There is no P/L in education.

        I think his point is that trying to apply market principles to non-market situations is doomed to fail. No market can make low IQ kids score better on tests (P/L). So all the incentive payments in the world won’t improve results the same way higher commission might make a salesmen sell more in a market situation.

  6. On the flip side metrics were great for winning baseball games.

    The best way to play Chess or Go is neither a computer or human being, but an intelligent human using a computer as an assistant to enhance their abilities.

    Metric analysis alone or “gut” alone are inferior.

    One thing I think the metric people underestimated is how metrics alone couldn’t win ideological battles or solve for bad social dynamics. Metrics are a tool they can be used for good an ill. In many cases “the data happens to confirm my previous desire” can be supplied by sufficient data massaging and use to shut down critical thinking.

  7. With respect to the creation of wealth, thru an economic system, the simple metric of profit and the single “maximize profit” goal has been hugely successful.

    It’s been so successful that a) “management by objective” is being tried even where there is no profit metric, and b) there is political pressure for successful corporations to create other goals and serve other “stakeholders”, rather than capital investing shareholders.

    Most other spheres do NOT have such a positive single metric.

    And even in firms, “gaming the profit” can be done some, like unpaid overtime work, as well as illegal actions (like illegal aliens working).

    There was a very important comment on schools:
    half the schools are below average.
    half the students are below average.

    The fact that there are race based correlations with respect to which half any student belongs to, makes it almost impossible to be honest in discussing education.

    See how Charles Murray is treated from his, mostly pretty good book on IQ, The Bell Curve.

    If one can’t honestly use honest metrics to know what’s honestly going on, why should we be surprised at the programs which fail which were based on dishonesty to start with?

    Black education in gov’t schools is a failure. By every metric. (I wish it was “Government Choice” theory instead of Public Choice, because the failures are gov’t failures, not failures of the public.)

    Vouchers will help, but won’t close the gap.
    Tho separating the black-white students into groups based on living with married parents or not would help:
    blacks living with married parents vs whites living with married parents,
    blacks not living with married parents vs whites not living with married parents.
    Plus,
    whites living with married parents vs whites not living with married parents.

    There’s both a black-white gap, and married parent or not gap.

    People who choose to have sex outside of marriage increase the likelihood of having kids outside of marriage, and the likelihood of worse education results.

    Can we honestly talk about that yet?

  8. A couple of years ago parents in my state pushed back against the insane amount of standardized testing taking place. Because of all the attention from the grassroots, a bill was passed that allowed parents to opt-out of the state testing.

    This year I talked to a parent with two children in the same school. The teacher of the child who tends to score lower on standardized tests suggested that she could opt-out of the testing — to avoid the undue stress, of course.

    The teacher of the child that tends to score well did not even mention that opting out was a possibility.

    I’m sure that’s an unintended consequence of the bill that was passed, but I wonder how much of this is going on.

    • A major reason we have standardized tests in the first place is

      1) The people who teach students are also the people who grade students;
      2) Teachers whose students get bad grades are considered bad teachers. They get grief from parents and administrators. If they are beginning teachers, they do not come back the next year.
      3) There is a tremendous incentive to pass almost all students.
      4) Lots of students graduate without having learned much.

      Standardized tests were meant to get around this by having outsiders assess how much students have learned. Now surprisingly, many parents would rather get nice news from teachers than bad news from standardized tests.

      (The general failure of “pay for performance” suggests that we expect schools to be way more effective that they can reasonably be, that presently they are doing “about as well as can be expected.”)

Comments are closed.