The working backwards algorithm, explained a bit.

As Nate Silver says, case counts are meaningless. He goes into the weeds of how testing protocols affect reported cases. I say don’t bother.

I want to try to infer the number of people who have been infected from the number of deaths. I assume that death reports in the U.S. are a more reliable indicator of what is going on here. Call the number of deaths D. The problem is that we don’t know three things:

–the number of deaths per 1000 people that have been infected. r. So if the death rate (relative to the true number of people that have been infected, which is not at all the same as the number of reported cases) is 2 percent, then r = 20.
–the typical number of days between infection and death. Call this n.
–the growth rate of the number of infections from n days ago until now. Call this g(n). Because the 3DDRR stayed at 2 for a long time, I was estimating g(n) as 2^(n/3).

The algorithm to estimate the number of people who have been infected today is D*g(n)*1000/r. If the number of days between infection and death is 9, then n = 9. If g(n) = 2^(n/3), then g(n) = 8. So if r = 20, then the estimate of the number of people who have been infected today is 400 times the number of deaths as of today. For New York state, that would mean about 1.64 million people infected. If instead we assume that n=15 and r = 2 (o.2 percent of infected people die), that says that 64 million New Yorkers have been infected–clearly an over-estimate. Either r is greater than 2 or g(n) is less than 32, or both. g(n) could be less than 32 even though n is 15, provided that the the growth rate of infections per day started to drop in recent days, which would mean that the 3DDRR is going to drop soon. In fact, I’m inclined to expect a pretty dramatic drop in 3DDRR for New York in the coming days.

9 thoughts on “The working backwards algorithm, explained a bit.

  1. “g(n) could be less than 32 even though n is 15, provided that the the growth rate of infections per day started to drop in recent days, which would mean that the 3DDRR is going to drop soon. In fact, I’m inclined to expect a pretty dramatic drop in 3DDRR for New York in the coming days.”

    Pretty important point here, if you expect a dramatic drop in 3DDRR. The growth rate in infections leads the 3DDRR by n days. So, g(n) to determine growth in infections between n days ago and now is really given by 3DDRR over the *next* n days. That will be much less than 2^(n/3) if 3DDRR over the next n days drops down to close to 1. That may be why earlier calculations with n=15 or n=21 produced extremely high numbers, exceeding population of NYC. If deaths level out over the next n days, then that means infections leveled out over the past n days.

    • To drive home the point, in his 4/4 post, “The 3DDRR”, Arnold says that, “It would be nice to see, say, 1.5 by April 7, 1.3 by April 10, 1.1 by April 13, and 1.002 from April 16 on”. If that were to happen, then even for n=21,

      g(21) = (1.5)(1.3)(1.1)*(1.002)^4 = 2.16.

      Obviously, that is much smaller than 2^(21/3) = 128.

  2. BTW—

    “A tiger at the Bronx Zoo tests positive for coronavirus”
    By Alaa Elassar, CNN

    Dogs have tested positive for the virus in Hong Kong.

    One may wonder at the efficacy of lockdowns if this particular virus easily hops between different species such as felines and canines and perhaps all other domesticated animals. I hope we do not end up destroying flocks of poultry.

    Or, for that matter, many wild species too. Some speculate this is a bat virus. What if it can cross over into squirrels or pigeons?

    • A lot of terrestrial animal life runs on a very similar biochemical framework and operating system, with a lot of use of common compounds. Viruses want to penetrate cell membranes and hijack some genetic machinery, and these are similar enough in species as distant as primates and egg-laying birds that we have lots of avian flus.

      The point is that some viruses can indeed jump species and those which can have a big advantage with spreading through modern human populations, where they are unlikely to persist long enough to keep evolving new strains, and need alternative, natural reservoirs to stick around. If other animals we use as pets or livestock can also catch it easily, then that would be a big worry. It sometimes is, though as far as anyone can tell, not a big one for c19.

      Still, of all the reservoirs, bats are the worst, and we should kill all the bats, or lots of us will die.

  3. I’ve posted this calculator in various forums but I don’t recall if I posted it here already. In any event, you can vary just about any variable you want. Note that you can drag the vertical dashed line which allows you to gauge the effect of intervention, and also put in a new R value in the top slider for the estimate of the effect of that intervention. Also, dragging your mouse across the graph updates the figures on the left-hand side.

    Epidemic Calculator

    http://gabgoh.github.io/COVID/index.html

  4. It all looks good toi me.

    Extend it to equilibrium.
    At equilibrium we have two distributions, the immunes and the infected. But these two distributions, in adaption, need to share the same spreading bandwidth. There is some proofs need here.

    What is the outbreak size at equilibrium? Use 1000 for New York, thus there are 8000 districts and health services organizes outbreak teams around that number.

    Total death rate? Given that immunes have about 40/1 ratio in sharing the spread bandwidth. (this is the seasonal immunity vs two week infection period). Then the virus distribution is almost completely overlapped by antibodies, with some 2.5% not overlapped. But one has to be infected and not overlapped and then only 2% die. The other virus distribution is neutralized by antibodies. So, to the nearest 1000 people, at equilibrium, the yearly death rate is less than .05 % of 8 million or 4,000.

    Still quite unsustainable. But I am assuming symmetrical distributions, hence worse case. Thee immune people and virus people are not cooperating with each other, except at outbreaks. So the arrival of a virus to a susceptible neighborhoods is an independent event of llow probability, and the complements, the arrival of the immunity to a neighborhood i an independent event of high probability. At equilibrium, these arrivals have the characteristic them the number of neighborhood queued up for virus is very small, and very left on the distribution, the probability of one virus event queued up takes more of the distribution, tit is not symmetric, very left centered. This is the opposite of the antibodies, they usually have forty neighborhood queued up and the distribution well centered. So the tail end of the virus distribution is small, rare, and the antibodies center is on top of it. The distribution of uncovered virus likely a fifth at least of the number posted. All the single virus outbreaks get covered right away, this is the adaptation. Almost never does the virus manage to get two in action together. One can see that assuming symmetric distributions that I am counting a bunch of virus combinations that will not occur.

    • Distributions explained better.

      This is a queuing problem, a two color queuing problem. You have two queues, one of virus and one of antibody. What is the probability that the virus queue will overrun the anti-body queue. Much lower than a pair of centered gaussian distributions. For the virus almost ll its events will be one or two neighborhoods in a bunch. For the anit-body, they almost always everywhere have many more antibody neighborhoods queues up all around to squash the outbreak. (remember, I am talking equilibrium, so I assume the antibodies have everything hedges and I can talk about real time battle on the margin). So I can treat this as a queueing problem in the abstract, compute the probability the virus queue outruns the antibody queue. And if I assume equilibrium means arrivals are independent, Gaussian, correct at no arbitrage condition.

Comments are closed.