More on Schooling, Deschooling, and the Null Hypothesis

Four links.
1. A NYT article on computerized grading of essays. I highlight the response of the Luddites:

“My first and greatest objection to the research is that they did not have any valid statistical test comparing the software directly to human graders,” said Mr. Perelman, a retired director of writing and a current researcher at M.I.T.

He is among a group of educators who last month began circulating a petition opposing automated assessment software. The group, which calls itself Professionals Against Machine Scoring of Student Essays in High-Stakes Assessment, has collected nearly 2,000 signatures, including some from luminaries like Noam Chomsky.

“Let’s face the realities of automatic essay scoring,” the group’s statement reads in part. “Computers cannot ‘read.’ They cannot measure the essentials of effective written communication: accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organization, clarity, and veracity, among others.”

Suppose, for the sake of argument, that the software does poorly now and can be fooled easily. My bet is that within five years there will be software that can pass a Turing test of the following sort.

a. Assign 100 essays to be graded by four humans and the computer.

b. Show the graded essays to professors, without telling them which set was computer-graded, and have them rank the five sets of essays in terms of how well they were graded.

c. See if the computer’s grading comes in higher than 5th.

While we are waiting for this test, the NYT article points to a nice paper by Mark D. Shermis summarizing results of a comparison of various software essay-grading systems.

2. Isegoria points to Bloom’s 2-Sigma Problem,

The two-sigma part refers to average performance of ordinary students going up by two standard deviations when they received one-to-one tutoring and worked on material until they mastered it, and the problem part refers to the fact that such tutoring doesn’t come cheap.

I am skeptical. It is possible that this educational intervention is so radically different from anything else that has ever been tried that it works much better than other interventions. But I would bet that if another set of researchers were to attempt to replicate this study, they would fail to find similar results. In social science in general, we do too little replication. This is particularly important when someone claims to have made a striking finding.

3. In the comments on this post, I found this one particularly interesting and articulate:

I think K-12 public schools are about warehousing children, giving parents childcare, whether they are at work or simply want a break from being around their kids (the quality of parenting going on is incredibly wide-ranging).

…why the current system is still in place-Cost, Convenience, Comfortability and Childcare. Unfortunately, the one-size-fits-all approach is ineffective, makes young people passionately hate school (which breeds some serious anti-intellectual pathologies) and is becoming even more centralized in curriculum and control. (See Common Core curriculum adopted by 48 states.)

I think that the Childcare aspect deserves more notice. When President Obama supports universal pre-school, the “scientific” case is based almost entirely on taking kids out of homes of low-functioning parents. But what affluent parents hear is “Obama is going to pay for my child care,” and that is what makes the policy popular.

More generally, assume that as a parent you believe that your comparative advantage is to work, rather than spend the entire day with your child. Then ask yourself why as a parent you would prefer to have your child in school rather than home without supervision. Even if the child learns less at school than they would at home, you still might prefer the school, as long as you are convinced that it reduces the risk of your child getting into really bad trouble.

4. From Michael Strong, in a long comment pushing back on my post last week.

No one doubts that if one compares one group that receives significant practice in an activity against another group with no exposure to the activity at all, that a treatment effect exists.

Why then are so many people skeptical that interventions in education make a difference? Largely because the comparisons exist between idiotic variations within a government-dominated industry.

As a rejoinder, I might start by changing “receives significant practice” to “engages in significant practice.” “Learning a skill” and “engaging in significant practice” are so closely related that I would say that, to a first approximation, they are the same thing.

This leads me to the following restatement of the null hypothesis.

The null hypothesis is that when you attempt an educational intervention, such as a new teaching method, the overall economic value of the skills that an individual acquires from age 5 to 20 is not affected by that intervention. I will grant that if you take two equivalent groups of young people and give one group daily violin lessons and the other group daily clarinet lessons then the first group is more likely to end up better violinists on average.

But when economists measure educational outcomes, they usually look at earnings, which result from the market value of skills acquired. To affect that, you have to affect the ability and willingness of a person to engage in practice in a combination of generally applicable fields and fields that are that person’s comparative advantage.

Aptitude and determination matter. Consider Malcolm Gladwell’s “10,000 hour rule” for becoming an expert at something. There is a huge selection bias going on in that rule. How many people who have little aptitude for shooting a basketball are going to keep practicing basketball for 10,000 hours?

When you consider how hard it is to move the needle half a standard deviation on a fourth-grade reading comprehension exam, the chances are slim that you are going to come up with something that affects long-term overall outcomes. Until we get the Young Lady’s Illustrated Primer.

Utilitarianism and the Three-Axes Model

Lifted from the comments on this post:

Where does someone whose commitment is to utilitarianism + empiricism fit into the framwork? It seems to me that this describes a large number of people on both sides of the political center (Democratic technocrats and the endangered species of Republican good government types) who think that barbarism, liberty, and oppression are awfully abstract notions to base concrete government decisions on.

I probably need to emphasize that I do not think that the three axes are used to reason carefully about political issues. I think that they are used rhetorically to solidify loyalties and to put down opponents. If someone uses utilitarianism and empiricism, then that tends to involve reasoning carefully, and it gets away from the three axes.

I do not think that one encounters much careful reasoning nowadays. I think people are too busy engaging in political tribalism. I have a lot more to say about this in my 60-page e-book, which I hope will be available in the not-too-distant future.

Basel and the So-Called Savings Glut

Thomas Hoenig and I have new commentaries expressing similar thoughts. Hoenig said,

We know from years of experience using the Basel capital standards that once the regulatory authorities finish their weighting scheme, bank managers begin the process of allocating capital and assets to maximize financial returns around these constructed weights. The objective is to maximize a firm’s return on equity (ROE) by managing the balance sheet in such a manner that for any level of equity, the risk-weighted assets are reported at levels far less than actual total assets under management. This creates the illusion that banking organizations have adequate capital to absorb unexpected losses. For the largest global financial companies, risk-weighted assets are approximately one-half of total assets. This “leveraging up” has served world economies poorly.

Read the whole thing. Then read my latest essay.

So what accounts for the low interest rate on long-term bonds, particularly those of the U.S. government? It is not “quantitative easing.” It is not a mysterious shift in preferences among savers. It is that banks, which enjoy enormous advantages in attracting funds from savers due to actual and perceived protection offered by governments, have a strong incentive to direct these savings into financial instruments that their regulators have designated as having little or no risk. Risk-based capital regulations may be ineffective at promoting bank safety. But they are plenty effective at allocating capital away from productive private investments and toward government bonds.

I also thought the Hoenig quote worth including in the essay.

The Stock Market’s Gains

What is the most positive economic news that we have received over the last six months? I am using “news” to mean “unexpected” or “somewhat surprising.”

The answer that comes to my mind is the increase in the stock market. But if the stock market is up on the basis of little or no positive economic news otherwise, then that sort of says that the reason people are buying stocks is because the market has been going up. That’s not what one would call a sustainable model. So as of today, I am even a little lighter into stocks than I was yesterday, and I was relatively light yesterday. To be sure, in inflation-adjusted terms, the market is not actually at an all-time high. But I figure that whatever my willingness to buy stocks was 6 months ago, it should be less now, no?

Speaking of stocks, I know of a newsletter writer who recommends specific stocks and always adds what he calls a “protective stop.” So he might say, “but X at 20, but put in a stop-less order to sell if it goes down to 18.” This struck me as a strange strategy, but today I was pondering it and I think I have it figured out.

Suppose that the stocks that he recommends are really no better or worse than buying an index fund. So, without the stop-loss orders, if you followed his buy recommendations you would get the exact same return as the market. With the stop-loss orders, it’s as if you are buying the market portfolio along with a portfolio of out-of-the-money put options. In this case, though, you only pay the option premium if the market bounces around, so you buy a stock at 20, sell it at 18, then buy it back (or buy some other stock recommended by the newsletter) when it goes back up to 20.

I think that this approach minimizes the chances that you will regret taking the writer’s advice. If the market rallies, you will be happy with your gains. If it falls, you will be happy that your losses are limited. And if it bounces up and down you are unlikely to notice that the advice is giving you a tendency to buy at the highs and sell at the lows. So I think this strategy would appeal to regret-averse investors. But it’s not a strategy that appeals to me.

The Minerva Project

David Brooks mentions it. InsideHigherEd describes it.

While MOOCs are basically supersized lectures offered to tens of thousands rather than hundreds of students, Minerva wants to use learning analytics to scale up Oxbridge-style tutorials to seminar-size online classes taught by professors who can work remotely from any location in the world.

…This, Nelson says, will avoid the limitation of the in-person lecture — namely that whatever is said just “vanishes into thin air.”

Thanks to Tyler Cowen for the pointer.

This sounds interesting. I was hoping to create a virtual seminar when Nick Schulz and I used Google+ hangouts. Here is where we discussed Charles Murray’s book Coming Apart.

I liked the seminars when I was at Swarthmore College. Each week the seminar met, one or two students would be assigned to write short papers to be the center of discussion. For example, I once was assigned a paper on the “cobweb model,” in which farmers base next year’s output on this year’s price. After much painful thinking, I denounced this model as irrational. On my own, I located John Muth’s paper, but I could not follow the math. What I came up with on my own instead was essentially the hypothesis of perfect foresight. It turned out, unbeknownst to me, that right at that time the topic of “rational expectations” was about to take the economics profession by storm.

The other characteristic of Swarthmore that I also have championed is the outside examiner. That is, the professor who puts together the syllabus and leads the class is not the same as the professor who puts together the assessment and grades the students.

I hope Minerva is successful with the idea of virtual seminars. I think that the risk is that it is positioned in a sort of no-man’s land, in between the backward model of existing universities and some more radical model of self-directed education that will emerge over the next decade. On the latter, look at these Unschooling Conferences, such as the Trailblazer gathering. Right now, these conferences signal the participant’s weirdness (as Bryan Caplan would predict), but if that should tip….

Chris Peterson is a Minerva skeptic.

If Minerva has higher standards then Harvard, than how is a student who can’t get into Harvard supposed to get into Minerva?

Read the entire rant. I, too, am skeptical. I remember Chris Whittle’s big education venture, called The Edison Project. It was pretty much all hat and no cattle, as they say in Texas. I was wary when he hired Benno Schmidt of Yale for a lot of money. I think if you are going to be an outside force disrupting education, you need to be an outside force. Somebody who has achieved prestige in the existing system is less likely to have the drive and originality to change it.

If I had a lot of VC money to do a project to execute a higher education start-up, I would consult with creative, unhappy professors at low-prestige places to mine their ideas. That said, I would not put them onto the management team. Unhappy people are unhappy people, so I would go with a non-academic management team to keep things sane. You can get inspiration from crazy, unhappy people, but they don’t do well in organizations.

Speaking of organizations, Henry Brighous writes,

While Fisman and Sullivan don’t really comment on this – they simply go on to describe the other kinds of coordination that AA undertakes – it’s hard for me to see how firing an employee simply for explaining how the internal process works to good effect could be efficient. It doesn’t provide any clear, useful incentives to improve overall efficiency. Nor is it conducive to a happy and productive employee culture. The simplest explanation is that Mr. X got fired because his bosses were self-aggrandizing *****s, who saw any public commentary as potential insubordination to be ruthlessly punished, even if this made for a more dysfunctional organization.

To get the context, you have to read the post, and perhaps read the book that he is discussing.

I remember when a project manager at Freddie Mac organized a session where team members could air their gripes. When she had heard all of the complaints about the stupidity and disorganization of the higher-ups, she said told the group that they should be happy that Freddie Mac wasn’t perfect, because if it was already perfect none of them would have jobs.

Indeed, one way to think of an organization is as a mountain of dysfunction. The job of managers at all levels is to try to chip away at that dysfunction. Maybe Henry is correct that Mr. X got fired because his bosses were jerks, but maybe Mr. X got fired because instead of chipping away at the dysfunction, he was contributing to it. I am not saying that I think he should have been fired. I have no idea. Corporate soap opera is complicated.

Kling’s Law of Bank Capital Regulation

Thomas L. Hogan, Neil Meredith, and Xuhao Pan write,

we find that the standard capital ratio is significantly better than the RBC ratio as an indicator of bank risk and performance and that using both ratios simultaneously does not produce better results. Taken in conjunction with the other available evidence, our findings indicate that RBC regulations lead to more risk-taking by individual banks, and more overall risk in the banking system, without improving the effectiveness of the Fed’s capital regulations.

RBC = risk-based capital. Kling’s law is that the capital measure used by regulators will, over time, come to be outperformed by a measure that the regulators are not using. So, if you are using standard capital, risk-based capital measures will better predict bank risk, and conversely.

The reason can be found in my essay, The Chess Game of Financial Regulation.

Regulatory systems break down because the financial sector is dynamic. Financial institutions seek to maximize returns on investment, subject to regulatory constraints. As time goes on, they develop techniques and innovations that produce greater returns but which can also undermine the intent of the regulations.

Tantalizing Findings

David Autor and Melanie Wasserman summarize trends in education and labor market outcomes by gender. Timothy Taylor locates their explanation for the relative decline among males.

the earnings power of non-college males combined with gains in the economic self-sufficiency of women—rising educational attainment, a falling gender gap, and greater female control over fertility choices—have reduced the economic value of marriage for women. This has catalyzed a sharp decline in the marriage rates of non-college U.S. adults—both in absolute terms and relative to college-educated adults—a steep rise in the fraction of U.S. children born out of wedlock, and a commensurate growth in the fraction of children reared in households characterized by absent fathers.

The second part of the hypothesis posits that the increased prevalence of single-headed households and the diminished child-rearing role played by stable male parents may serve to reinforce the emerging gender gaps in education and labor force participation by negatively affecting male children in particular. Specifically, we review evidence that suggests that male children raised in single-parent households tend to fare particularly poorly, with effects apparent in almost all academic and economic outcomes. One reason why single-headedness may affect male children more and differently than female children is that the vast majority of single-headed households are female-headed households. Thus, boys raised in these households are less likely to have a positive or stable same-sex role model present.

As I interpret it, their story is one of mutually reinforcing economic and social trends. The economic trend is that the comparative advantage of non-college-educated males in the work force has declined, as innovation and globalization have increased productivity in manufacturing. This reinforces a social trend in which those males are not attractive marriage partners, so that women who formerly would have married them are instead having children out of wedlock. This social trend then reinforces the economic trend, because men born out of wedlock are disadvantage when it comes to being able to remain in school.

I would say that the trends are real, but the narrative is controversial. I think this is a situation where you pick your narrative to fit your policy recommendation. Are you Bryan Caplan, and do you recommend promoting marriage? Then your narrative has to be that marriage plays a causal role in improving men’s earnings. Are you Barack Obama, and do you recommend expanding pre-school and access to college? Then your narrative is that the the main causal factor is education. Are you Charles Murray and do you recommend promoting Victorian virtues? Then your narrative is that this is a civilization-barbarism problem, and we have to reverse the slide into barbarism.

My preferred narrative is that Neal Stephenson predicted this in The Diamond Age. The Vickies and the Thetes have divergent lifestyles, and I suspect that the attempt by the Vickies to impose their lifestyle on the Thetes is doomed to fail.

On the topic of marriage trends, Reihan Salam writes,

instead of serving as a foundation of a successful adult life (a “cornerstone”), it is seen as a culmination of a successful young adulthood (a “capstone”), according to the authors of the Knot Yet report on delayed marriage.

Pointing out the likely correlation between a decline in marriage and an increase in government dependency, Salam writes,

My suspicion is that it will be very difficult to construct such a post-marital libertarian agenda, but that’s not to suggest it’s a futile effort.

He then writes,

What I find interesting is the emerging tension between two tendencies on the center-left: (1) the civil libertarian desire to protect the autonomy of families, particularly families rooted in minority cultural traditions, as a post-marital culture yields ever more children raised in the context highly fragile, unstable family relationships; and (2) the egalitarian imperative to do more to build the human capital of children raised in the poorest households, an effort that may well require increasingly intrusive, heavy-handed, paternalistic interventions.

At the risk of being uncharitable, I do not think that (1) is a factor. Using the three-axes model, the single mom is in the oppressed class and her disadvantaged offspring are in the oppressed class, end of story.

In the talk that I gave in Phoenix, I compared universal pre-school to eugenics. Both appeal to the same desire to improve the human race based on “scientific evidence” of the unfitness of some parents.

Sentences to Ponder

From Brad DeLong:

the twentieth century is unique in that its wars, purges, massacres, and executions have been largely the result of economic ideologies. Before the twentieth century people slaughtered each other for the other reasons. People slaughtered each other over theology: eternal paradise or damnation. People slaughtered each other over power: who gets to be top dog, and to command the material resources of society. But only in the twentieth century have people killed each other on a large scale in disputes over the economic organization of society.

Pointer from Mark Thoma.

I will be seeing Brad and Mark on April 11-12 at this event. It is not a public event, but I will have a bit of time off from the conference. If you are in the Kansas City area and want to try to arrange to meet at some point, leave a comment.

A Book I Will Not Review

Because I contributed a chapter. The book is the Routledge Handbook of Major Events in Economic History, edited by Randall E. Parker and Robert Whaples. The handbook tends to be U.S.-centric, with some surprising exceptions, such as a brief, fascinating chapter on World War I. The chapters are predominantly about events in the twentieth century. The book is priced out of your range, unless you are a library.

My chapter is called The 1970’s: The decade the Phillips Curve died. My main point is that except for the 1970’s, the Phillips Curve has performed really well. However, because of the 1970’s, macroeconomics went through great contortions from which it has not recovered.

This is not to say that we should go back to the macroeconomic consensus as it existed in 1970 (although that is where I see Paul Krugman coming out). But I do not think that the macroeconomic consensus as it existed in 2007 was any better. Hence PSST.

There is another chapter on the 1970’s by Robert Hetzel, which covers much of the same ground as my chapter. And he got to use graphs, which makes me jealous (my chapter would have worked better with graphs).

He and I would differ in our interpretation of the 1980-1983 period. Hetzel writes,

The Volcker-Greenspan FOMCs succeeded in controlling inflation without the need to engineer periodic bouts of high unemployment.

But we had the highest spike in the unemployment rate since the Great Depression–higher than the peak unemployment rate in 2009.