The essay on the null hypothesis and Charles Murray

Posted on February 20, 2020 by Arnold Kling

I am posting it below, because so many readers complained about Thinkspot. It is true that Thinkspot is not in a satisfying state as is. Please comment only the essay. I will put up a separate post on the issues with Thinkspot.

1. If the shared environment explains little of the variance in cognitive repertoires, and
2. If the only environmental factors that can be affected by outside interventions are part of the shared environment,
3. Then outside interventions are inherently constrained in the effects they can have on cognitive repertoires.
–Charles Murray, Human Diversity, Chapter 13.

As an example of an outside intervention, consider reading to pre-school children. Researchers have observed that pre-school children who have been read to a great deal by their parents subsequently perform better in school than students who have not been read to as much.

But this relationship is not necessarily causal. It could be that the better school performance is due to inherited characteristics that are correlated with how much reading the parents do to their pre-school children. In order to establish causality, one would have to conduct an experiment in which children are randomly selected into a control group that receives little reading and a treatment group that receives a lot of reading.

If such an experiment were conducted, my prediction is that the effects on the treatment group would be.

–small to begin with.
–fade out completely within a few years, meaning that by, say, fourth grade, the treatment group and the control group show no difference.
–to the extent that the effects were non-zero and did not fade out, the results would fail to replicate in a subsequent experiment.

I call this prediction The Null Hypothesis, borrowing the statistical term for “no effect of the treatment.” My reading of the literature on educational treatments is that the null hypothesis essentially always holds. When a treatment is rigorously tested, using experimental methods, its effects are small, fade-out is complete, and/or the results fail to replicate.

Why does the null hypothesis hold for educational treatments (and, incidentally, for other policy treatments, such as the effect of job training programs on subsequent employment or the effect of health insurance on health outcomes)? Consider four factors that affect human outcomes:.

1. Overall cultural environment.
2. Genetic inheritance.
3. Gestational variation.
4. Specific environmental interventions.

I believe that I have presented these in order of importance.

The overall cultural environment, or “milieu” as Murray calls it, clearly matters. If you could transport one of your children to a different historical period or to a totally different society, then you can be sure that the child’s outcomes will be affected. The Flynn Effect, in which average IQ changes across generations, is indicative of the importance of the cultural environment. I think it only makes sense to talk about variations of the other three factors within a given environment, such as the affluent countries in the 21st century.

The significance of genetic inheritance is what Murray highlights. The evidence from twin studies is persuasive in that regard.

Murray does not discuss gestational variation, but Kevin Mitchell’s Innate highlights its importance. Mitchell argues that some of the variation between identical twins in cognitive repertoires is due to mutations and other accidents that occur as the fetal brain forms.

In twin studies that account for variation as the sum of genetic variation and variation in the “shared environment,” the innate gestational variation tends to be misleadingly attributed to the “shared environment” component. I believe that this leads people to be more optimistic about the potential for specific interventions than is warranted.

In my view, once we have accounted for the differences created by the overall cultural environment, genetic inheritance, and gestational variation, there is very little room for specific interventions to make a difference. In The Nurture Assumption, Judith Rich Harris pointed to evidence that parental behavior makes little difference in children’s outcomes. If the people who are most heavily involved in raising children make little difference, then what is the likelihood that, say, a particular elementary school teacher or a specific schooling method will make a difference?

I know that there are studies that purport to find exceptions to the Null Hypothesis. Such studies receive wide acclaim. But these tend to be one-off results that do not replicate.

I plan to write subsequently on points where my view differs from Murray’s. But on the Null Hypothesis, my views are coherent with his.

25 thoughts on “The essay on the null hypothesis and Charles Murray”

ksdale on February 20, 2020 at 11:16 am said:

I have 4 young children and I’ve tried to take the null hypothesis to heart. Assuming our specific interventions matter very little, what’s the best thing to focus on?

Given that kids spend such a huge amount of time in school and many (most?) of them learn so little (and given my experience of doing so much of my learning outside of school), I’m operating on the assumption that the motivation to acquire skills and knowledge and do things is more important than anything. (I realize the irony that I’m betting on *this* intervention to work.) Basically the effect of any intervention will be dwarfed by what a kid *wants* to do. So the only thing that matters is getting them to want to do things that will benefit them, and the corollary is that once you get them to want to do the right things, all you have to do is provide them with books and internet access and they’ll do the rest.

Perhaps this is like trying to manipulate the overall cultural environment? But the result is a lot of funny looks when we tell other parents how little we try micromanage our children’s behavior (contrasted though, with the observation that we seem to have much higher expectations for our children’s behavior than most of the people we run into).
- Matt on February 21, 2020 at 7:51 am said:
  
  Basically the effect of any intervention will be dwarfed by what a kid *wants* to do. So the only thing that matters is getting them to want to do things that will benefit them, and the corollary is that once you get them to want to do the right things, all you have to do is provide them with books and internet access and they’ll do the rest.
  
  After a combined 50+ years in education, my parents came to the same conclusion. Kids won’t learn unless they value the education presented to them. Once they value the education, you can lead, follow, or get out of the way, but you can’t stop them from learning at that point.
Metamorf on February 20, 2020 at 11:19 am said:

I’m not clear why a “specific intervention” isn’t a part of an overall cultural environment. Depending on the scope and duration of the intervention, it may be a very small part, and likely therefore have only a very small effect, perhaps undetectable by experiment, but it seems unreasonable to assume there is some kind of clear dividing line between “intervention” and “environment”. I can understand skepticism here — the hope of finding a specific intervention that is small and well-defined, yet has a large and durable effect is no doubt remote. But its potential value makes it a worthy on-going pursuit.
Nicholas Weininger on February 20, 2020 at 11:44 am said:

What is your view of the evidence on 1:1 tutoring as an educational intervention? Other folks I respect who also take a dim view of most interventions (e.g. Freddie DeBoer, who is very leftist in worldview and forthcoming about that, but generally honest and open to evidence) have said that tutoring seems to be an exception. Not many people get a lot of it– and to declare my bias, I did get a lot of it and feel that it benefited me– so it wouldn’t necessarily show up in overall average effects of shared environment differences.
- Ninja on February 20, 2020 at 12:12 pm said:
  
  To follow up on this point, here was an article that tried to make sense of why 1:1 tutoring, coupled with “mastery learning”, was especially effective.
  
  https://nintil.com/bloom-sigma/
  
  Could this be one intervention that defies the “Null Hypothesis”?
- education realist on February 20, 2020 at 3:32 pm said:
  
  Tutoring intervention has never shown anything but immediate results in a few week period. There’s no evidence that students who were very weak become strong, no evidence that they stop needing tutoring, no evidence that they even retain the knowledge.
RAD on February 20, 2020 at 11:47 am said:

I’ve been reading through three books in parallel: Charles Murray’s “Human Diversity”, Kevin Mitchell’s “Innate”, and Robert Plomin’s “Blueprint”. What has become apparent to me is that a very distinct academic disagreement exists between the role of “peers” in the non-shared environment. Judith Rich Harris and Steven Pinker think peers are central while Mitchell and Plomin dismiss it outright. Consider the following from Innate Ch 5: The Nature of Nurture:

A good example of this is language perception. As infants are exposed to a primary language, they develop expertise at categorically recognizing the characteristic phonemes, or speech sounds, of that language. For example, native English speakers become adept at distinguishing between the sounds of “b” and “v,” or “r” and “l.” Spanish speakers, by contrast, may not distinguish so readily between “b” and “v” sounds, while Japanese speakers may have difficulties hearing a distinction between “r” and “l.” Amazingly, EEG (electroencephalogram) recordings show that the auditory regions of the brains of Japanese infants make that distinction just as well as infants exposed to English as a first language. But that ability is lost over time. The process of developing expertise to sounds in one language eventually closes off the ability to distinguish between sounds that are not heard as often or between which making a distinction has never been important. The phonemes “r” and “l” thus literally sound the same to Japanese speakers, in the way that the tonal subtleties of Cantonese may be completely lost on native English speakers. It is this loss of flexibility that explains why we lose, after a certain age, the ability to learn a second language without a telltale foreign accent.

A “telltale foreign accent” is Pinker’s first example of the peer influence at work. His second language example is the well documented language progression from Pidgins to Creoles. The Creole language is created by young peer cohorts going through the same development milestones. Their Pidgin speaking parents never grasp the complex structure of the Creole. Pinker makes clear that this phenomena may only be restricted to the language modules of the mind but it is reasonable to hypothesize that it applies to other areas as well.

My question is why Plomin and Mitchell don’t take this hypothesis seriously?
- Roger Sweeny on February 20, 2020 at 5:13 pm said:
  
  I’ve read Harris and Pinker, Mitchell and Plomin and I can’t get my brain about what the problem is here. What are Plomin and Mitchell not taking seriously? The passage from Mitchell says that he thinks milieu/overall cultural environment is important. Part of that importance comes from the fact that peers also reflect it. If the overall cultural milieu treats “r” and “l” the same, so will a child and his peers.
  
  (As I recall, The Nurture Assumption said that almost all differences came from heredity and “non-shared environment” in roughly 50-50 parts, and Harris guessed that what was important in the “non-shared environment” was peers. In No Two Alike, she tried to get beyond that but admitted that she couldn’t say much that was definite. If she were alive today and had read Mitchell, I strongly suspect that she would feel that peers are not as important as she originally thought. A good deal of the non-genetic variance doesn’t need to be explained by peer effects. Rather, it is accounted for by “gestational variation”.)
  - RAD on February 20, 2020 at 5:54 pm said:
    
    The Mitchell quote goes into detail about the one case that most people have first hand experience with: our accents match our peers, not the accent of our parents and not the accent of our teachers. He doesn’t mention peers in this context. Plomin specifically says the non-shared environment is random but he never specifies how he falsified the peer influence. His only reference to Judith Rich Harris was mentioning a popular book in 1998.
    
    Proving or disproving the peer influence for personality traits is hard. My impression from their books is that they are treating absence of evidence as evidence of absence. I think some traits fall into probabilistic brain variation scenarios like Mitchell describes while at least language is mostly influenced by peer environment as Pinker describes. Murray’s 3 points above ignore interventions aimed at peers because no one has done the hard work of isolating the details of peer influence, as far as I can tell.
    - Roger Sweeny on February 20, 2020 at 7:02 pm said:
      
      So Mitchell does seem to take peer effects seriously, at least for some things.
      
      I get the feeling that with the four of them, it’s not a matter of disagreement so much as a difference of emphasis.
      
      Proving or disproving the peer influence for personality traits is hard. … Murray’s 3 points above ignore interventions aimed at peers because no one has done the hard work of isolating the details of peer influence, as far as I can tell.
      
      Amen.
      - RAD on February 20, 2020 at 7:44 pm said:
        
        No, Mitchell’s discussion of language accent is fully quoted. There is nothing before or after that mentions the possibility of peer influence. The “Nature of Nurture” chapter is exactly where that discussion should take place or at least mentioned. Plomin and Mitchell just leave the peer part to the reader’s imagination. It is not a difference in emphasis, it comes across as an awkward silence.
      - Roger Sweeny on February 20, 2020 at 8:45 pm said:
        
        I’m confused. Does Mitchell say, “our accents match our peers, not the accent of our parents and not the accent of our teachers.”? In that case, it’s peer effects. Or does he say nothing more than the passage at February 20, 2020 at 11:47 am? Which seems to make no distinctions between relatives, peers, and unrelated outsiders–treating them all as one homo-linguistic unit.
        
        I’m thinking now it’s the second, and the quote about peer accents is you.
      - RAD on February 21, 2020 at 5:55 am said:
        
        I’m thinking now it’s the second, and the quote about peer accents is you.
        
        It’s the second. The words are mine. I was trying to channel Pinker/Rich-Harris and explain the major contribution peer environments play in language accents. Many people have first hand experience with this truth. Mitchell’s passage at February 20, 2020 at 11:47 am is used as an example of brain plasticity in his book, the pre-wired to hard-wired phase, but he is silent on the peer-environment aspect of this phenomena.
      - Roger Sweeny on February 21, 2020 at 9:34 am said:
        
        So it looks like you can break down “why are you the way you are?” into 2 big boxes and 5 sub-boxes. One big box is what you’re stuck with at birth (aka Nature). That breaks down into genes and “gestational variation”. If like most everyone in politics or academia, you want to change the world for the better, then you want this box to be small. But if you believe Plomin and Mitchell, it accounts for maybe 70% of the variation in the U.S. population.
        
        The second box is things potentially changeable (aka Nurture). It includes the family environment (aka shared environment), a person’s peers, and the general cultural milieu, including school. Changes here can be powerful. Drafted soldiers will kill on command. But often are not. Forcing everyone to take algebra in 8th grade or pushing them to take AP Calculus doesn’t seem to lead to much increase in math skill.
      - RAD on February 21, 2020 at 10:11 am said:
        
        Forcing everyone to take algebra in 8th grade or pushing them to take AP Calculus doesn’t seem to lead to much increase in math skill
        
        I’m not sure this is true. This is knowledge captured in culture and there has been little work on figuring out the what/when/who/how of teaching. Culture has learned that Latin and Greek don’t help society when taught in aggregate and we probably underestimate the impact of the “ABC Song” that is taught in the anglosphere. The thing we know about AP Calculus is that the variation in teaching skills probably doesn’t matter but I’d be cautious about dropping the course from the curriculum.
      - Roger Sweeny on February 21, 2020 at 10:19 am said:
        
        I wouldn’t drop it. Just realize that only a small proportion of 17 year olds will get much from it.
      - RAD on February 21, 2020 at 10:34 am said:
        
        Well I think the “AP” part has already addressed the specialized stream. I’d love to figure out the core shared knowledge parts of calculus that helps society. My intuition is that slopes and area under the curve are important. We live in a world of charts that represent data. The Bell Curve and its S-Curve integral are really useful general purpose tools. I think it is these types of heuristics that account for the Flynn Effect. Much of the AP type curriculum serves as an IQ proxy. It is sad that society is not obsessed with improving education using these levers.
      - Roger Sweeny on February 21, 2020 at 11:28 am said:
        
        Years ago, the “AP” part meant that people who wouldn’t get much from AP courses didn’t take them. But then, people in the ed business said, “People who take AP courses are more likely to go to college and to go to better colleges. If we encourage more of our students to take AP courses, we can get that good result here.” For a while, there was even a movement to require everyone to take at least one AP course.
        
        There was also a feeling that limiting AP courses to people who had done very well previously closed doors to under-represented minorities. People in the business are driven crazy by the persistence of “the gap” between ORM and URM, and this looked to them like one way to close it. The results have not been as hoped.
  - RAD on February 20, 2020 at 6:23 pm said:
    
    As a possible mechanism that explains why home environments have so little influence, consider the Westermarck effect. Close proximity during a crucial development stage may have an inhibitory effect.
chedolf on February 20, 2020 at 11:48 am said:

Thanks for posting it here. Was eager to read this when you described it in the earlier post.
asdf on February 21, 2020 at 8:41 am said:

What does “Overall cultural environment” mean here?

“I think it only makes sense to talk about variations of the other three factors within a given environment, such as the affluent countries in the 21st century.”

So culture is the most important item, but nearly all major societies today are functionally equivalent enough that it isn’t a big deal? It doesn’t explain differences in outcome today?

“The Flynn Effect, in which average IQ changes across generations, is indicative of the importance of the cultural environment.”

How? What’s important about the Flynn Effect? Does it measure “intelligence” in the sense of being able to achieve socio-economic outcomes? Our grandparents weren’t retarded (as the Flynn effect might have one believe). “Culture” might indeed have made us better at certain non-g loaded IQ subtests…but is that “important”?

Culture is placed first on this list, but we aren’t given any indication what about culture is important to achieving what desired outcome and how its relevant to explaining the world today.
Roger Sweeny on February 21, 2020 at 2:15 pm said:

Murray does not discuss gestational variation, but Kevin Mitchell’s Innate highlights its importance. Mitchell argues that some of the variation between identical twins in cognitive repertoires is due to mutations and other accidents that occur as the fetal brain forms.

I don’t think that is quite right. Mitchell hardly talks about mutations since they are so uncommon between identical twins. He also doesn’t think about gestational differences as “accidents”. His explanation is more radical, in the sense of “going to the root”. The process of going from a single fertilized egg to a trillion differentiated cells is not one where there is a well-defined end-product. There is no “blueprint” in the genes. What the genes do is specify a process, a process that is fuzzy. The process is analog rather than digital, with most steps happening within a range–and, of course, thousands of processes are happening in parallel. If you made a thousand clones, all of which had exactly the same DNA, each one would be different when it was born.

Gestational variation is not an accident. It is normal.
- Arnold Kling on February 21, 2020 at 2:25 pm said:
  
  Some of the accidents are mutations that occur during gestation, at least if I understand him correctly. The mutations may only affect particular cells, not the person’s basic genome. Keep in mind that I know no biology.
- RAD on February 21, 2020 at 2:37 pm said:
  
  I’m pretty sure Kling’s description of Mitchell’s view on mutations is accurate but you have to understand the context of the type of mutations involved. These are mutations involving the division of soma cells, not germ line cells. If you were to sequence the DNA of every cell in your body, you would find many independent errors. These errors are one source of randomness that occurs during gestation.
- Roger Sweeny on February 21, 2020 at 4:12 pm said:
  
  I completely agree that these mutations can have an effect. I just think they are not Mitchell’s focus. My big takeaway was just how normal variation is, with mutation being only one (and a small) reason why.
  
  Kind of like: If a million people flip a coin a hundred times, the most common result will be 50 heads and 50 tails. But the vast majority of people will get a different result. 51 heads and 49 tails will be only slightly less common. The difference between the two will not be the result of accident. It will be simple ordinary randomness. Nor can you blame 53 heads or 54 heads on accident, though more heads will become increasingly less common. Gestational development is like that, just many, many, many times more complicated.
  
  The womb is pretty well insulated from the outside world but there are some things that are not simple randomness, that might be called accidents: the mother takes thalidomide during a certain part of the pregnancy causing birth defects; the mother drinks heavily during pregnancy causing fetal alcohol syndrome; the mother doesn’t realize she is pregnant and continues taking birth control pills during the first several months of pregnancy, resulting in “non-traditional” sexuality.

Comments are closed.

askblog

taking the most charitable view of those who disagree

The essay on the null hypothesis and Charles Murray

25 thoughts on “The essay on the null hypothesis and Charles Murray”