Genes and heritability: from the comments

At least two commenters pointed to an article that indicates that the use of genes to predict height has gotten more effective.

One of them wrote,

Furthermore, the DNA chips used in today’s genome-wide association studies contain a few million variants at most, so these studies cannot even in principle recover the full heritability which is strongly influenced by very rare variants.

This is the answer. Polygenic scores are, for now, based on SNPs. Whole-genome sequencing (WGS) recovers full heritability for height.

As the other commenter put it,

in a short amount of time, we’ve gone from “17%” as “most predictive”, to another study saying 40%, to a new one getting close to the heritability range.

Suits vs. geeks in the virus crisis

1. Allison Schrager writes,

Among the unknowns about the virus: the true hospitalization and death rates; how infectious it is; how many asymptomatic patients are walking around; how it affects young people; how risk factors vary among different countries with different populations, pollution levels and urban densities. It seems certain the virus will overwhelm hospitals in some places, as it has in China and Italy. We also don’t know how long these extreme economic and social disruptions will last. Without reliable information, predictions are based on incomplete data and heroic assumptions.

…The way forward is testing as many people as possible—not only people with symptoms. Some carriers are asymptomatic. California is starting to test asymptomatic young people to learn more about transmission and infection rates. Testing everyone may not be feasible, but regularly testing a random sample of the population would be informative.

This is the analytical mindset, which is sorely needed. What I called the “suits vs. geeks divide” in 2008 is haunting us again. Ten days ago, the challenge was to get the suits to understand exponential growth. Hence, they were two weeks behind. Now, the challenge is to get the suits to make decisions based on rational calculations as opposed to fears or whoever shouts the loudest in their ears.

But much needs to change. Think about the “analytics revolution” in baseball. In the 1980s, the revolution started*, with Bill James and others questioning the value of the routinely-calculated statistics. Just as one example, data geeks discovered that a batter’s value was better measured by on-base percentage than batting average, even though the latter was prominently featured in the newspapers and the former was not. Soon, the geeks started longing for statistics that weren’t even being kept, and they started efforts to track and record the desired metrics.

(*In 1964, Earnshaw Cook wrote an analytical book, but he drew no followers, probably because personal computers had not yet been invented.)

Based on what we are seeing now, I think that epidemiology is ripe for an analytics revolution. To me as an outsider, the field relies too much on simulations using hypothetical parameters and not enough on identifying the data that would be useful in real time and making sure that the such data gets collected.

2. James Stock writes,

A key coronavirus unknown is the asymptomatic rate, the fraction of those infected who have either no symptoms or symptoms mild enough to be confused with a common cold and not reported. A high asymptomatic rate is decidedly good news: it would mean that the death rate is lower, that the hospital system is less likely to be overrun, and that we are closer to achieving herd immunity. From an economic point of view, a high asymptomatic rate means it is safe to relax restrictions relatively soon, and that hospitalizations can be kept within limits as economic activity resumes.

Conversely, a low asymptomatic rate would require trading off losing many lives against punishing
economic losses.

Neither the asymptomatic rate nor the prevalence of the coronavirus can be estimated if tests are prioritized to the symptomatic or if the included asymptomatic are unrepresentative (think NBA players).

Instead, we need widespread randomized testing of the population.

It may seem counterintuitive that we should be rooting for a high number of people running around with the virus without symptoms. But that would mean, among other things, that their presence is not creating huge risks for the rest of the population. You want the ratio of mild cases to emergency-room cases to be high.

3. Larry Brilliant says,

We should be doing a stochastic process random probability sample of the country to find out where the hell the virus really is.

Note that he has a lot of anger against President Trump. I won’t push back at Mr. Brilliant (I’m not being sarcastic, that is his name), but I think his rhetoric is stronger than his case. See my post on anger.

4. Dan Yamin says,

But there is one country we can learn from: South Korea. South Korea has been coping with corona for a long time, more than most Western countries, and they lead in the number of tests per capita. Therefore, the official mortality rate there is 0.9 percent. But even in South Korea, not all the infected were tested – most have very mild symptoms.

The actual number of people who are sick with the virus in South Korea is at least double what’s being reported, so the chance of dying is at least twice as low, standing at about 0.45 percent – very far from the World Health Organization’s [global mortality] figure of 3.4 percent.

He is at least taking care not to take statistics at face value. But don’t be satisfied with trying to guess based on data that don’t measure what you want. Try to get the authorities to provide you with the numbers you need.

Computer models and the ADOO loop

When you ask a question of a computer model, it provides out to several decimal places answers that can be off by several orders of magnitude. Give me a clear, logical back-of-the-envelope calculation grounded in real-world data over a model simulation any day.

Several people have sent me links to papers that use computer models to purport to simulate the economic consequences of alternative strategies for dealing with the virus. I don’t bother reading them. When I see that Jeffrey Shaman’s pronouncements about the rate of asymptomatic spreading are based on a simulation model, I assign them low confidence.

Once you build a model that is so complex that it can only be solved by a computer, you lose control over the way that errors in the data can propagate through the model. For me, it is important to look at data from a perspective of “How much can I trust this? What could make it misleadingly high? What could make it misleadingly low?” before you incorporate that data into a complex model with a lot of parameters.

I read that in the U.S. we have done 250,000 tests for the virus, and yet we have only 35,000 positive cases. But before we jump to any conclusions based on this, we ought to get an idea of how many of these tests are re-tests. If the average person who is tested is tested three times, then almost half of the people being tested are positive. I have no idea what the average number of tests per person actually is–it probably isn’t as high as three, but it isn’t as low as one, either.

A lot of people are quoting lines from Gene Krantz in the movie Apollo 13. One of my favorites is when he warns against “guessin’.” Computer models are just “guessin'” in my view. Making decisions based on models is approximately as bad as making them based on blind panic.

I am constantly calling for taking a random sample of the population, say 5000 people, and testing them on a repeated bases. I am quite willing to take some testing resources away from being used for people walking in with symptoms. If people have symptoms and we don’t have resources to test them, then isolate them as if they were infected, in a non-hospital setting. You can base the decision about when to hospitalize the person on how their symptoms progress.

We don’t yet have a proven drug treatment, so you don’t really help an infected person by testing them. Testing helps reassure the uninfected people that they don’t need to be totally isolated. That is a benefit, but not enough to justify putting all our resources into people with symptoms, leaving no resources for random testing.

The OODA loop says, “Observe, Orient, Decide, Act.” Right now, our public policy seems like we’re in an ADOO loop–“act, decide, orient, observe.” I find it frustrating.

Addressing the issue of asymptomatic spreading

From the WSJ.

“Certainly there is some degree of asymptomatic transmissibility,” Anthony Fauci, the director of the National Institute of Allergy and Infectious Diseases, said at a news conference Friday. “It’s still not quite clear exactly what that is. But when people focus on that, I think they take their eye off the real ball, which is the things you do will mitigate against getting infected, no matter whether you are near someone who is asymptomatic or not.”

I think Dr. Fauci has missed the point. It’s one thing for me as an individual to treat everyone around me as if they could be a spreader, and act accordingly. I don’t shut down the economy by washing my hands a lot and staying 6 feet away from people.

But when public officials treat everyone as a spreader and order people to shelter in place, that does shut down the economy. So I think it is important to make an informed decision about whether treating everyone as if they could be spreaders is wise. That is, it would help to be able to know the results of the experiment, or to be able to anticipate the results.

The article goes on to say,

Researchers have posted to the open-access site MedRxiv their own recent studies that used data from the outbreak that suggest people can be infectious sometimes days before they show symptoms of Covid-19. Some reports suggest some carriers never experience any.

But being asymptomatic only makes you dangerous if you can be a spreader. The story gives numbers from one research paper.

. . .early in China’s outbreak, 86% of infections went undetected. The paper also noted that because they were so numerous, stealth infections were the source for roughly 80% of known ones.

This isn’t quite the answer we need, though.

Let C be the event “come in contact with someone with the virus who is asymptomatic.”

Let I be the event “become knowingly infected with the virus.”

What the quoted paragraph gives is the claim that P(C|I)= 80/100. That says that of every 100 people knowingly infected, 80 got the infection from coming in contact with an asymptomatic carrier. What I want to know is P(I|C). Out of 100 people who come in contact with an asymptomatic carrier, how many will become knowingly infected? P(I|C) = P(C|I)*P(I)/P(C).

At first, I thought that there cannot be more asymptomatic carriers than there are people infected, so P(I) has to be greater than than P(C). So if the report is correct, out of every 100 people who come into contact with an asymptomatic carrier, more than 80 will become infected. That would seem to justify a lockdown policy.

But remember the important modifier knowingly infected. If not everyone is tested, then certainly there can be more asymptomatic carriers than there are people knowingly infected. If there are 10 times more, then out of 100 people who come in contact with an asymptomatic carrier, only 8 will themselves become infected, and that might not be enough to justify crippling the economy by telling everyone to shelter in place.

So I still think we need harder data. And yet once again, I make a plea for random testing. Since we know P(I), if we also knew P(C), we could make an intelligent estimate of the key probability, P(I|C). That in turn would help inform public policy decisions that are of huge import.

Why are polygenic scores not better?

Start with what I said in my review of Robert Plomin’s Blueprint.

Plomin is excited by polygenic scores, a recent development in genetic studies. Researchers use large databases of DNA-sequence individuals to identify combinations of hundreds of genes that correlate with traits.

The most predictive polygenic score so far is height, which explains 17 percent of the variance in adult height… height at birth scarcely predicts adult height. The predictive power of polygenic scores is greater than any other predictors, even the height of the individuals’ parents.

One can view this 17 percent figure either as encouraging or not. It represents progress over attempts to find one or two genes that predict height, an effort that is futile. But compared to the 80 percent heritability of height it seems weak.

Plomin is optimistic that with larger sample sizes better polygenic scores will be found, but I am skeptical.

My question, to which I do not have the answer, is this: if height is 80 percent heritable, why is the statistical correlation found between genes and height only 17 percent?

I do not know any biology. But as a statistician, here is how I would go about developing a polygenic score.

1. I would work with one gender at a time. Assume we have a sample of 100,000 adults of one gender, with measurements of height and DNA sequences. I would throw out the middle 80,000 and just work with the top and bottom deciles.

2. For every gene, sum up the total number in the top decile with that gene and the total number in the bottom decile with that gene, and see where the differences are the greatest. If 8500 in the top decile have a particular gene and 1200 in the bottom decile have the gene, that is a huge difference. 7500 and 7200 would be a small difference. Take the 100 largest differences and build a score that is a weighted average of the presence of those genes.

3. To try to improve the score, see whether adding the gene with the 101st largest difference improves predictive power. My guess is that it won’t.

4. Also to try to improve the score, see whether adding two-gene interactions helps the score. That is, does having gene 1 and gene 2 make a difference other than what you would expect from having each of those genes separately? My guess is that some of these two-gene interactions will prove significant, but not many.

It seems to me that one should be able to extract most of the heritability from the data by doing this. But perhaps this approach is not truly applicable.

Another possibility is that heritability comes from factors other than DNA. Perhaps the reliance on twin studies to try to separate environmental factors from genetic factors is flawed, and the heritability of height comes in large part from environmental factors. Or perhaps DNA is not the only biological force affecting heritability, and we need to start looking for that other force.

Another possibility is that scientists are working with much smaller sample sizes. If you have a sample of one thousand, then the top decile just has one hundred cases in it, and that is not enough to pick out the important DNA differences.

As a related possibility, the effective sample sizes might be small, because of a lot of duplication. Suppose that the top decile in your sample had mostly Scandinavians, and the bottom decile had mostly Mexicans. Your score will be good at separating Scandinavians from Mexicans, but it will be of little use in predicting heights within a group of Russians or Greeks or Kenyans or Scots.

I am just throwing out wild guesses about why polygenic scores do not work very well. I probably misunderstand the problem. I wish that someone could explain it to me.

The self-quarantine decision: my thought process

Even though we have no symptoms and no reason to believe we have been infected, my wife and I are going to try to do everything reasonable to reduce outside contact for a while. Call it “social distancing” or self-quarantining.

This means giving up discretionary trips to the grocery store or other shopping. It means giving up going to dance sessions (that is a big sacrifice, as far as I am concerned). It means not having social meals with others. It means not going to visit our children and grandchildren (an even bigger sacrifice).

My thought process is this:

1. I would rather be in front of an exponential curve than behind it.

When I started my Internet business in April of 1994, most people had not heard of the World Wide Web, and many of those who had heard of it took a “wait and see” attitude about whether it would work out as a business environment. It only became clear that the Web was a business platform more than a year later. But by that time, it was harder to ride the curve.

A lot of people, including government leaders in most countries, are going with a “wait and see” approach before reacting to the virus. They are certainly not getting ahead of the curve. In a few weeks, the self-quarantine decision we are taking may be imposed on everyone. Meanwhile, we hope to reduce our chance of contracting the virus and becoming spreaders.

2. In an uncertain situation, I like to compare the upside and the downside. When the upside of doing something is high and the downside is low, go for it. When it’s the opposite, avoid it.

So think about the upside and the downside of going about our normal business instead of self-quarantining. The upside would be that for the next few weeks I get to dance more and spend more time with friends and family. The downside is that I contract the virus and spread it. I think that the downside, even though it is unlikely, is worse, especially becoming a spreader.

3. How long will we self-quarantine? Either we’ll get something like an “all-clear” signal in a few weeks, or, if my worst fears are correct, there will be government-imposed measures that are as strong or stronger than what we are taking.

4. If I were in government, I would, in addition to making an all-out effort to test people with pneumonia symptoms, be making a large effort to test a sample of asymptomatic people. And re-test people in that sample every few days. From a statistical perspective, random testing strikes me as necessary in order to get a reliable picture of the epidemic. I would not trust an “all-clear” signal that was not backed by evidence from random testing.

Note that this post is not about the current Administration, so please self-quarantine your political comments and take them elsewhere.

UPDATE: John Cochrane recommends an essay by Tomas Pueyo. The message is to respect the exponential curve.

Differences in suicide concentration

Scott Alexander writes,

While genetics or culture may matter a little, overall I am just going to end with a blanket recommendation to avoid being part of any small circumpolar ethnic group that has just discovered alcohol.

That is at the end of a long and typically careful analysis of parts of the world that have high suicide rates.

Because suicide is a rare event, it is very difficult to make inferences from data. Scott, as usual, does a good job of being careful. One note that I would add is that Case and Deaton observed that in the U.S., suicide rates are higher in states with low population density. I don’t know if this is just coincidence or in fact there is something protective against suicide about high population density.

How to reduce the racial gap in reading scores

According to this study, the problem is worse in progressive cities.

Progressive cities, on average, have achievement gaps in math and reading that are 15 and 13 percentage points higher than in conservative cities, respectively

Pointer from Stephen Green, who sees it as an argument for cities to start to vote Republican.

The study compared test scores in the 12 most progressive cities (according to an independent measure) and the 12 most conservative cities. They report the results in tables. I saw a red flag in that they focused on the achievement gap, rather than black achievement scores per se.

From a Null Hypothesis, perspective, one way to reduce the racial gap is to start with dumber white students. Then when differences in schooling have no effect, you wind up with a smaller racial gap.

Using their tables, I got that for reading, the median score in the conservative cities for blacks was 24.5, and in the progressive cities it was 20.5. The median score in the conservative cities for whites was 61.5 and in progressive cities it was 69. Since much of the difference in the gap seems to come from lower test scores for whites, I am inclined to go with the Null Hypothesis interpretation.

Dalton Conley on polygenic scores

At the AEI, Dalton Conley commented on Charles Murray’s new book. At minute 30, Conley starts to discuss polygenic scores. At around minute 35, he points out that the polygenic score for height, which seems to do much better than polygenic scores for other traits, still does a terrible job. The score, which has been based primarily on data from Europeans, under-predicts heights of Africans by 6 inches.

As you know, I am a skeptic on polygenic scores. The exercise reminds me too much of macroeconomic modeling. Economic history did not design the types of experiments that we need in order to gauge the effect of fiscal and monetary policy. What we want are lots of time periods in which every little changed other than fiscal and monetary policy. But we don’t have that. And as you increase the sample size by, say, going back in time and adding older decades to your data set, you add all sorts of new potential causal variables. Go back 70 years and fluctuations are centered in steel and automobiles. Go back 150 years and they are centered in the farm sector.

Similarly, evolution did not design the types of experiment that we need in order to gauge the effect of genes on traits. That is, it didn’t take random samples of people from different geographic locations and different cultures and assign them the same genetic variation,, so that a statistician could neatly separate the effect of genes from that of location or culture.

If I understand Conley correctly, he suggests looking at genetic variation within families. I am not sure what advantage that has that is not outweighed by the disadvantage that you reduce the likely range of genetic combinations that you can observe.

What is the true margin of error?

Alex Tabarrok writes,

The logic of random sampling implies that you only need a small sample to learn a lot about a big population and if the population is much bigger you only need a slightly larger sample. For example, you only need a slightly larger random sample to learn about the Chinese population than about the US population. When the sample is biased, however, then not only do you need a much larger sample you need it to large relative to the total population.

I am curious what Tabarrok means in the first sentence by “need a slightly larger sample.” I thought that with random sampling, the margin of error for a sample of 1,000 is the same whether you are sampling from a population of 10 million or 50 million.

But the issue at hand is how a small bias in a sample can affect the margin of error. We frequently see election results that are outside the stated margin of error of exit polls. As I recall, in 2004 conspiracy theorists who believed the polls claimed that there was cheating in the counting of actual votes. But what is more likely is that polling fails to obtain a true random sample. This greatly magnifies the margin of error.

In real-world statistical work, obtaining unbiased samples is very difficult. That means that the true margin of error is often much higher than what gets reported.