Experiments vs. tampering

Alex Tabarrok wrote,

the experiment forces people to reckon with the idea that even experts don’t know what the right thing to do is and that confession of ignorance bothers people.

Recently linked by Tyler Cowen.

W. Edwards Deming distinguished experiments from tampering. With an experiment, you change a process and explicitly compare the results to a baseline. With tampering, you change the process without rigorously examining the results.

For example, in education, most curriculum changes involve tampering. Schools rarely test to see whether a curriculum works.

I once sat next to a high official in the Department of Education, and he was horrified when I suggested experiments in education. “Would you want your child to be part of an experiment?” he asked, incredulously. “The schools do it all the time,” I responded. “They just don’t bother checking to see whether their experiments work.”

Another example is the pandemic. When I complain about the unwillingness of health officials to conduct experiments to see what factors affect the spread of the disease, few people agree with me (readers of this blog are an exception). They quickly invoke Joseph Mengele.

But nobody invokes Joseph Mengele when it comes to lockdowns, which are simply experiments whose results are not rigorously evaluated by those who conduct them.

It is very hard to make a moral case against experiments that is not also an even stronger case against tampering. But we have a much higher tolerance for tampering than for experiments. I am inclined to fall back on Alex’s answer. Saying that you are conducting experiment implies that you are uncertain. Tampering implies that you know what you are doing. Sadly, people have a higher tolerance for tampering.

23 thoughts on “Experiments vs. tampering

  1. There is a third option which provides a more rounded picture – accountability. Making a decision, evaluating the results, and taking the blame – that’s leadership with accountability. Making a decision, not evaluating the results, is tampering. Experiments is making a lower regret decision to try something with some safeguards in place; and then evaluating the results before a final decision.

  2. You nailed it.

    Moreover, an experiment must (a) exhibit strong effects and (b) replicate, to count as good evidence for policy-making on a large scale or for the long term. It often occurs that investigators conclude a research paper about an experiment by declaring “policy implications.”

  3. Lockdowns are political moves to partially assuage public fears previously stirred up by politicians who want to appear to be highly concerned over a threat to safety, but don’t want those fears to get out of their control. Their real attitude towards the risks can be judged by their behavior, i.e., ignoring the rules they create. Think of Gov. Newsome’s French Laundry dinner.

    The situation can be viewed in light of Mencken’s famous quote:
    “The whole aim of practical politics is to keep the populace alarmed (and hence clamorous to be led to safety) by menacing it with an endless series of hobgoblins, all of them imaginary.” In the case of the pandemic, the hobgoblin is not imaginary, but the actual dangers can be greatly exaggerated for the same ends, for example by ignoring that the young are little affected, and that the disease mostly attacks the very elderly and infirm, many of whom were not long for this world in any case.

    So there was never any intent on the part of those instigating the lockdowns to learn whether they were effective; they are a political tool, and will be set aside as needed. Now we read that the NY and Chicago mayors are calling to end them on the grounds of economic damage, and mirabile dictu, studies are appearing showing them to be ineffective. But all this has long been known.

  4. Except that we are discussing human behavior. Experimenting has limited value. Group dynamics are chaotic and human beings think marginally. It is nearly impossible to model an experiment that captures the full range of possible elements that affect success and failure, and we all know people adapt and learn new marginal decisions.

    The lockdown analogy is perfect. We can make this decision as simple or as complex as we wish, but if we see something dangerous, we avoid it. In the end, we are forced to make judgements about establishing healthy patterns of behavior, and to establish strong cultural commitment to those patterns, or not.

    Usually, quibbling about the nuances of the idea matters less than widespread commitment to that idea. Lots of good ideas don’t work because of a lack of social cohesion. Lots of questionable ideas do work because people choose to make them work.

    Experiments do not capture the complexity of establishing and maintaining an ethic of cooperation across a wide array of circumstances. We just know historically that it can easily go many different ways. We know we can cooperate and live well, or we can kill each other, or something in between, because we have done all of those things many times before.

    Cooperation delivers the most human value, and it is often an ethereal thing. Experiments are often just BS.

    • This may be the case for traffic laws (everyone following the same ones is more important than finding the optimal ones). It’s probably not the case for lockdowns (all of us committing to the lockdown only makes it more effective if it actually works in a more absolute sense). It’s definitely not the case for education. (If we all just agree that it’s part of our culture that differential equations don’t matter as much as Sylvia Plath, differential equations don’t stop actually mattering)

      • I am not arguing that cooperation guarantees success. I am arguing that lack of cooperation guarantees failure, and it undermines the value of experimentation in the political sphere.

        • I don’t think that it does guarantee failure though. For some ‘treatments’ (as with drugs), noncompliance said f participants reduced how informative the experiment is, but only to the extent of noncompliance. There are no network effects; if half the participants don’t take the drug, it still works for the half that do. I think probably applies to most experiments in education: total cooperation isn’t necessary to detect if something works. For modifying traffic laws, total cooperation is necessary. But I don’t think it’s necessarily true that most experiments in public policy are like traffic laws rather than pharmaceuticals.

          • You keep switching perspectives. I can take a pill, and it works, and you can be uncooperative and not take a pill, and it doesn’t. Sure.

            But the point here is experimentation and public policy, not granular success or failure of a pill. The experiment, and judgements about its success or failure largely hinge on durable rates of success as justification.

            You can run an experiment and get one answer, but you can’t anticipate the divergence in basic cooperation of stakeholders between the experiment and the public rollout.

            You can’t know if some politicians or pundits will undercut trust, or budgets, or authority, or instead support the efforts in good faith.

  5. Or it’s much more insidious than that and most don’t even know what an experiment is or how knowledge is gathered, since only the most cursory thimble of our stock of knowledge ever entered into their dull “minds.”

    With the wealth of information available now, it is really remarkable the spread you can see between those few who genuinely hunt for that knowledge- hint, it’s not the so-called experts most of the time- and the vast majority who resolutely ignore it, choosing to numb themselves with cat videos or Pelosi pronouncements instead.

    • I long for a world where more people would leave everyone else alone and just watch cat videos.

      • Agreed. Twitter didn’t go far enough. They should just ban everyone that talks about politics at all. It would be much better.

  6. Why would politicians want anyone to determine the results of their tampering / policies? That would make them accountable. Incentives matter.

  7. I think it’s even simpler than what Alex was describing.

    People don’t like the idea of being “experimented on” because it sounds insulting, disrespectful, degrading. Lab rats get experimented on, bacteria get experimented on, potatoes get experimented on – these things are all subhuman. Whatever humans you accept are being “experimented on” are humans you’re deeply disrespecting. Disrespect is a serious problem for our monkey-brains, it was a matter of life and death.

    To test this theory, we would need to find a correlation between levels of respect, or levels of perceived respect, and feelings towards experimentation on those groups.

  8. Along the tampering line, Deming was also fond of statistical process control charts which were developed by Shewart. Tampering in that context refers to making changes to a process that is within control (+/- 3 std dev of mean). Tampering occurs when people make adjustments based on differences that are in reality noise. When you look at the variation within and between geographic areas, it’s hard to say there is not a lot of noise (Std dev is very large) so that for the most part every reaction is tampering.

    What it looks like to me is “cases/deaths or whatever metric is increasing so we need to implement policy x”… then “cases are even higher, we need to strengthen policy x”. The possibility that it is just noise and policy x has no impact is never considered.

  9. I think it also comes down to notions of fairness and direct causation.

    In an experiment, people are subjected to at least two different situations (treatment vs. placebo). People generally expect one of the two groups to be better off than the other. If you have a teaching innovation you think will help kids learn better, people notice that if you’re right, the kids who do not receive the innovation will be harmed. If the innovation seems ludicrous, then the students receiving the intervention will be harmed. Related to this is that I think people have excessive confidence in their own untested opinions.

    When it comes to medical experiments, say testing COVID vaccines on willing participants who are directly infected, there’s a direct causal effect of the researcher giving COVID to the people involved and any adverse outcomes.

    I’m not a utilitarian but I will also note that people given appropriate knowledge should be allowed to take risks, especially risks for the greater good (arguably no different than serving in the military).

  10. In the U.S. and other countries, some academics have been arguing for evidence-based policy-making. The meaning and practice of evidence-based policy are easily contested, however. Tampering with the evidence —from history, simulations, and experiments— is possible, and more importantly “the evidence” shows that tampering is highly probable. Once we recognize the great diversity and complexity of social reality, we should not be afraid of asking academics to be accountable for the knowledge they claim to have produced. It has become too common to end published research with a concluding section that entertains additional ideas “to increase the value-added by the authors” for policy-making. Also, we cannot ignore that today too many academics depend on politicians and government officials to fund their research.

    Academic knowledge should never be ignored when is reliable and relevant but is never sufficient for policy-making. If policy-makers ignore relevant and reliable academic knowledge, they should have other knowledge to justify ignoring it. There is no need to reinvent the wheel because this problem has been discussed for a long time in the context of judicial decisions (Some may argue that judicial decisions are quite different. Yes, they are different but not enough to ignore that discussion).

    • Yes, Arnold, the problem is not experiments vs. tampering, the problem is evidence tampering in the production of knowledge (Akerlof and Shiller should have focused their “Phishing for Phools” on universities). Evidence tampering seems to be a much more common problem than fraud in your country’s elections (except, of course, for the last presidential election).

      In the case of Alex and Tyler, they rely heavily on virtue-signaling to ignore how unreliable most of the evidence is ( Yes, trust me, I’m doing the hard work of reviewing the evidence for you).

  11. Imagine there were a lockdown-equivalent vaccine. Even the most lockdown-friendly studies put the effectiveness of lockdowns somewhere in the neighborhood of 50% reduction in transmission. There’s no way a vaccine would be approved if the treatment group had only 50% fewer covid cases, but all the lockdown-related side effects: higher suicide rates, increased violent crime, 1/20 chance of being unemployed.

    • I haven’t seen a study that shows a 50% reduction in transmission during a lockdown. Do you have a link? Studies in spring showed essentially no effectiveness of lockdowns in France, Spain or the UK as case increases were already in fast decline prior to the lockdowns. A study in Norway concluded the same thing about that country and concluded its lockdown was a mistake.

      A study was recently published in the European Journal of Clinical Investigation that analyzed coronavirus case growth in 10 countries in early 2020 including the U.K. and concluded that there was no clear effect for lockdowns slowing the virus.

      “We first estimate COVID‐19 case growth in relation to any NPI implementation in subnational regions of 10 countries: England, France, Germany, Iran, Italy, Netherlands, Spain, South Korea, Sweden, and the US. Using first‐difference models with fixed effects, we isolate the effects of mrNPIs by subtracting the combined effects of lrNPIs and epidemic dynamics from all NPIs. We use case growth in Sweden and South Korea, two countries that did not implement mandatory stay‐at‐home and business closures, as comparison countries for the other 8 countries (16 total comparisons).

      “”After subtracting the epidemic and lrNPI effects, we find no clear, significant beneficial effect of mrNPIs on case growth in any country. In France, e.g., the effect of mrNPIs was +7% (95CI ‐5%‐19%) when compared with Sweden, and +13% (‐12%‐38%) when compared with South Korea (positive means pro‐contagion).”

      https://onlinelibrary.wiley.com/doi/abs/10.1111/eci.13484

  12. I once sat next to a high official in the Department of Education, and he was horrified when I suggested experiments in education. “Would you want your child to be part of an experiment?” he asked, incredulously. “The schools do it all the time,” I responded. “They just don’t bother checking to see whether their experiments work.”

    OK, I don’t know who the high official in the DoE was, but lots of experiments are done in education, and we do know about them, because there’s research, and you, Arnold, know this because that’s why you have the Null Hypothesis in Education. And I’m pretty sure the DoE guy knows that, so whatever he was shocked by was either something much less anodyne or he was pretending.

    As for curriculum, we do research the effectiveness on it and it’s all worthless, because as all the research says, teachers do whatever the hell they think is best for kids, and the reason they do this is because curriculum is always one size fits all and it has to be because we aren’t allowed to track and ability group. Thus teachers are stuck with a range of abilities that the curriculum doesn’t acknowledge, except when it has these little subsections called “Differentiation” or “resources” which suggest that, say, when you’re teaching factoring you’ve got kids who don’t quite get it the first time round, when in fact the problem is that you’ve got kids who count on their fingers to multiply and a question like “what multiplies to -27 and adds to -6 is a four minute exercise.

    Or you’ve got this fabulous phonetics curriculum for reading support but half your kids already read so they don’t need it, and a quarter of them benefit from the curriculum but the other fourth doesn’t recognize all the letters yet.

    The real problem with people talking about education is that most of them honestly don’t understand what they hell they’re talking about.

    We don’t “experiment” on kids. We are legally required to ignore reality or get sued, and unless you have plans to fix that, all the snooty talk about what “schools” do is not getting you anywhere, because it’s not “schools”. It’s teachers. And we’re legally allowed to do it.

  13. Tampering implies that you know what you are doing. Sadly, people have a higher tolerance for tampering.

    Not quite – they have a higher DESIRE for certainty. Less uncertainty. Most folk, most of the time, don’t want “sometimes this way, sometimes that way”.

    On populism, “demagoguery” wasn’t noted. But the worst forms of populism is when many, or most, folk support a demagogue who is “certain” about his own beliefs. This certainty, so similar to sincerity, is often more attractive then more honest slight or large uncertainty.

    The Truth is, for many questions, is “we don’t know”.

    There’s also the unfortunate cases where we do know, like how bad promiscuity is for kids, but we don’t like the truth. So prefer one untruth or another or yet another. Differences in IQs is unfair, and in group IQs is group unfair – but we don’t like that truth.

Comments are closed.