Poor Replication in Economics

Andrew C. Chang and Phillip Li write,

we replicate 29 of 59 papers (49%) with assistance from the authors. Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable.

Pointer from Mark Thoma.

As an undergraduate at Swarthmore, I took Bernie Saffran’s econometrics course. The assignment was to find a paper, replicate the findings, and then try some alternative specifications. The paper I chose to replicate was a classic article by Marc Nerlove, using adaptive expectations. The data he used were from a Department of Agriculture publication. There was a copy of that publication at the University of Pennsylvania, so I went to their library and photocopied the relevant pages. I typed the data, put into the computer at Swarthmore–and got results that were nowhere close to Nerlove’s.

6 thoughts on “Poor Replication in Economics

  1. That goes along with widespread failures to replicate most major cancer studies and psychology studies

    And can anybody even keep track of which food components are supposed to be good or bad anymore?

    And that’s just the ‘falsifiable in principle’ stuff.

    If one were to apply some new forensic technique to old cases and find out that, not merely a handful, but the majority of prisoners were wholly innocent, the legitimacy of the entire justice system would be thrown into doubt, without any presumption that people could and should still trust most results.

    The public would be outrages and even insiders would do a lot of soul-searching and correctly conclude that there is something enormously and fundamentally rotten with the current mechanisms of review and indeed the whole enterprise, that the new revelations cast doubt on every case and warrant an attitude of radical skepticism, and that what is required is profound, broad, and deep reform in the direction of more intense independent scrutiny, adversarial auditing, and serious personal liability for bad results.

    I’m not detecting that yet – or even much of a “Bayesian adjustment of priors” – for the softer sciences.

    • “The public would be outrage[d] and even insiders would do a lot of soul-searching”

      always the optimist!

  2. Given the sample size, I would not place that much confidence in the “less than half” finding holding up. Seems their claim is even less robust than what conventional practices dictate, ironically.

  3. Economics has also largely resisted efforts to put in place ethical standards. Incentives seem to matter for everyone but economists.

  4. I think the impact of data cleaning and data processing is underestimated. Yes, data need to be cleaned, but usually there are many choices made along the way, some of which are debatable. Let’s be charitable and assume that a big part of the replication problem is due to this. It only changes the nature of the problem: data cleaning has a big impact.

    To put it another way, recall Ed Leamer’s Specification Searches, which Dr. Kling has mentioned before. Leamer exhorts economists to present (I paraphrase) a spectrum of results, not just *a* result. This isn’t about dishonest research (although that might also be a problem). It is about seeing research as a process of making many choices on how to best analyze data, and for full transparency we should see a range of results, not just one.

  5. A large chunk of my dissertation was replication. I did this without the help of the authors. I used the descriptions in the paper themselves to find data and methods.
    I was successful. What discouraged me was how few authors corresponded with me.

Comments are closed.