Disaggregating the economy using big data

When do you suppose that the following sentences were written:

Should we worry about a computerized creation that plays to our unconscious? How vulnerable are we to these increasingly refined sales pitches?

They come from Michael J. Weiss, on p.25 of his book The Clustering of America. It is a mostly-favorable treatment of the use of big data to sort American zip codes into socioeconomic clusters, to help businesses make better use of direct-mail marketing and local advertising. The data also were used by political organizations to target efforts to get out the vote, solicit donations, and tailor messages.

The book appeared almost thirty years ago, in 1988. I read it when it first came out, and I recently ordered it so that I could read it again. I also ordered a follow-up book that Weiss wrote in 2000, called Our Clustered World. I will have more to say about the two books when I have finished. I am interested in what they contribute to the project of disaggregating the economy, meaning treating the U.S. as a collection of diverse economies that trade with one another.

One side note: In the late 1990s, when I was running my commercial web site providing information to people who were relocating, we contacted a company that had a similar cluster analysis, in order to enable users to search for particular types of towns. For example, you could select a place where you lived (or wish you lived) near Baltimore and then look for the three most similar towns near, say, Los Angeles. The application would take the socioeconomic cluster that you started with and match you with a part of Los Angeles that had a similar socioeconomic cluster.

The company provided us with their data on a couple of CD’s, and for us, loading it and putting up a front-end that could do the searches the way we wanted was a technical project. Probably the biggest challenge was creating a way to search by town name as well as by zip code.

Shortly after the application went live on the web, I received a very angry note from a Civil Rights organization. The data for each socioeconomic cluster included the two or three consumer items that were purchased much more in that cluster than in other clusters. Our application spat out that information, along with the other data about location. It turned out that one cluster’s unusually strong consumer propensities included fast food fried chicken. Someone evidently had done a search that caused this cluster description to appear and contacted the The Civil Rights group about it. The note that they sent us accused us of stereotyping the location as African-American, so that we were promoting segregation and redlining.

Of course, the company was not using racial stereotyping to speculate on consumer propensities. All of the consumer propensities that the company identified were data driven. If this was a stereotype, it evidently had a basis in reality.

We decided that it was appropriate to edit out that particular example, and just leave in the consumer propensities that did not have any racial connotations. As I recall, we looked in the cluster descriptions for other examples of consumer propensities that might have ethnic connotations, but we did not see any.

Disaggregating the economy: product scanner data

David Argente, Munseob Lee, and Sara Moreira write,

In this paper, we exploit detailed product- and firm-level data to study the sources of innovation and the patterns of productivity growth in the consumer goods sector over the period 2006Q3–2014Q2. Using a dataset that contains information on the products of each firm and the characteristics of each product, we document new facts on product reallocation. First, we find that an important component of reallocation of products happens within the boundaries of the firm. Second, the largest changes in product quality come from new firms launching new varieties and from small firms expanding to other product lines. Third, we document that product reallocation within firms is procyclical. Fourth, we find that within-firm product reallocation is larger in high productive firms and firms that invest more in R&D. Finally, we quantify how important how product reallocation affects firm-level productivity growth and and innovation as reflected by changes in their total factor productivity.

NOTE: I am quoting a version that is labeled VERY PRELIMINARY AND INCOMPLETE. The published version is gated. Pointer to the published version from Tyler Cowen.

Apparently, during the recession, firms reduced the pace at which they added new products and retired existing products. I interpret this as a slowdown in investment.

As a first approximation, this is not supportive of my view of a recession as a breakdown of existing patterns of specialization and trade. One would expect to see an increase in the retirement of existing products if my view were correct.

One way to rescue my view would be to say that firms respond to a deterioration in the sustainability of existing patterns of specialization and trade by reducing their investment in creating new patterns. This seems like a counterproductive response, except that it does conserve cash in the short run.

Disaggregating the Polity: Colin Woodard

He wrote a book called American Nations, which I just read for the first time. He offers a model of America as having a culture that can be thought of as eleven different nations, each dominant in particular geographic regions. It seems to me that it is a book that someone should have pressed me to read before. I will be recommending it often in the future, I am sure.

Woodard sees a centuries-long struggle for power between the nation he calls Yankeedom (New England) and the two nations that he calls Tidewater and Deep South. His antipathy toward the latter shows through, especially in the final chapters of the book.

More recently, he has some essays that I am checking out. In this essay, he claims that the urban-rural divide is simplistic and wrong, and that his 11-nations model works better.

In five of the regional cultures that together comprise about 51 percent of the U.S. population, rural and urban counties always voted for the same presidential candidate, be it the “blue wave” election of 2008, the Trumpist storm of 2016, or the more ambiguous contest in between. In Greater Appalachia, the Deep South, Far West, and New France, rural and urban voters in aggregate supported Republican candidates in all three elections, whether they lived in the mountain hollers, wealthy suburbs, or big urban centers. In El Norte, both types of counties always voted Democratic, be they composed principally of empty desert or booming cityscapes.

…The stark urban-rural divide in the country is to be found almost exclusively in the Midlands, where it has a disproportionate effect on the Electoral College, as that region straddles several historic swing states: Pennsylvania, Ohio, Iowa, and Missouri among them.

I am curious to delve into his 11-nations model and to consider each nation in economic terms. Are there likely differences in what they import and export? Differences in wealth? etc. Here is a first pass, using nine of his nations (omitting New France, which is mostly in Canada, and First Nation, which is locations with a lot of Native Americans). The table offers my impressions of the leading industries in the various regions.

Nation (Woodard’s name) Typical Cities Major industries
Yankeedom Boston, Madison Higher education, high tech, health care
New Netherland New York City, Greenwich Ct. Financial services, entertainment, international trade
Midlands Philadelphia, Peoria Agriculture, manufacturing
Tidewater McLean, Newport News Federal government, military
Greater Appalachia Wheeling, Muskogee Extractive (mining, forestry, etc.)
Deep South Charleston, Mobile Agriculture, manufacturing
El Norte El Paso, Tijuana Extractive, retirement services
Left Coast San Francisco, Portland, Ore. High tech, international trade
Far West Bozeman, Rapid City Extractive, tourism

I am wildly guessing about the industries for El Norte. I think he wants to limit it to the southwestern U.S. (plus northern Mexico), and he wants to exclude southern Florida.

I am not sure where Los Angeles fits in his scheme. It must be an amalgam of some sort. Some of New Netherland, with its ethnic diversity, ambition, and glamour. Some of El Norte, with its Hispanic population. Perhaps an element that is Far West, where there is dependence on government investment combined with resentment of government.

Any other criticisms or suggested modifications to my industry guesses are welcome.

Disaggregating the economy: Yelp data

Edward L. Glaeser, Hyunjin Kim, and Michael Luca write,

Our results highlight the potential for using Yelp data to complement CBP by nowcasting – in other words, by shedding light on recent changes in the local economy that have not yet appeared in official statistics due to long reporting lags. A second potential use of crowd-sourced data is to measure the economy at a more granular level than can be done in public facing government statistics. For example, it has the potential to shed light on variation in economic growth within a metropolitan area. In Section V, we turn to New York City to see how Yelp does at measuring the micro-geography of a municipality. Yelp does seem capable of tracking the evolution of neighborhoods even below the ZIP code level.

CBP = County Business Patterns, a government statistical publication.

I am interested in the potential to be able to use new data sources to decompose the U.S. economy into regions. These might be actual regions, like the Mid-Atlantic or virtual regions, like the major metros on the two coasts.

Disaggregating the economy: cost of living

Timothy Taylor writes,

here are the US states color-coded according to per capita GDP with an adjustment for Regional Price Parities: that is, it’s a measure of income adjusted for what it actually costs to buy housing and other goods. With that change, California, New York, and Maryland are no longer in the top category. Hoever, a number of midwestern states like Kansas, Nebraska, South Dakota, and my own Minnesota move into the top category. A number of states in the mountain west and south that were in the lowest-income category when just looking at per capita GDP move up a category or two when the Regional Price Parities are taken into account.

Taylor’s post indicates that the Bureau of Economic Analysis has some very interesting data on output and prices down to state and local levels. This would really help with a project of disaggregating the economy. Here is a recent press release about the data.

Equity without capital, twenty years later

I received a review copy of Capitalism without Capital: The Rise of the Intangible Economy, by Jonathan Haskel and Stian Westlake, which has a 2018 copyright date.

1. My first reaction is to be a bit miffed that my name is not in the index. Nick Schulz and I wrote a book on the intangible economy, and the first edition appeared in 2009. Going back even further, in 1998 I wrote an essay called Equity without Capital. That essay is still interesting to read, and it anticipated some of the central issues in their book. But probably fewer than 200 people saw it when I wrote it.

2. Hal Varian and Carl Shapiro aren’t in the index, either. That is a less forgivable omission. Information Rules sold well.

3. I hurried through the book, and I was inclined to give it a mixed review. But when I re-read it, I only re-read the sections that I liked the first time. I decided that those sections are really good. Now I am inclined to give the book a strong recommendation.

4. The organization of the book is excellent. The good news is that you usually can skip to the end of the chapter and read its conclusion to get the main point. The bad news is, well, why not just condense the book into an extended essay? And if you left out the sections of the book that did not do much for me, the extended essay would work even better.

Gosh, I am really being hard on them, for some reason. It really is a first-rate book. I’m not sure why I keep wanting to talk about what I don’t like about it. But, here I go again:

5. They make a big deal about recent literature that arrives at measures of intangible capital. However, as they point out, such measures are fraught.

Their analysis says that intangible capital has four s’s: sunk costs (investments in assets that cannot be re-sold); scale (network effects and path dependency can bring very high returns); synergies (combinations of ideas are worth more than the ideas are worth separately); and spillovers (ideas are easily copied or imitated).

This implies, as they recognize, that intangible capital can be worth much more than what it costs to obtain, because of scale and synergies. But it can be worth much less than what it costs to obtain, because of sunk costs in non-marketable assets. In bankruptcy, you can sell off the office furniture and the fleet of trucks (tangible assets), but not the business process that proved unsustainable or the failed attempt to establish a brand (sunk costs).

But the measures of intangible capital use acquisition cost as the measure of investment in intangible capital. That may be a reasonable way to value tangible capital. But to me, their four s’s imply that intangible capital’s value cannot be reliably represented by its acquisition cost.

To get technical, Tobin’s q is the ratio of the market value of capital to its replacement cost. Think of it as the ratio of the stock price of a firm to the acquisition cost of its assets. For tangible capital, q should be close to 1. But for firms with a lot of intangible capital, like The Four, it is much, much greater than 1. Tyler Cowen’s recent column, Investors are celebrating the tech revolution, says that the current high values of q are a positive signal about future economic growth.

Of course, for many dotcom stocks in the 1990s, q shot way up before dropping to zero, which is what my essay was predicting. But by the way, one of the stocks I was skeptical about back then was Amazon, and if you held onto that, the losses on the rest of your 90’s doctcom portfolio might not trouble you.

Looking at this balance between superstar value and failure, the authors propose that, well, on average, the value of intangible capital for the whole economy ought to be somewhere close to what it costs. I thought they were just hand-waving at that point.

They understand well enough that intangible capital is not exactly like tangible capital in the neoclassical model. But I do not think that they are ready as I am to take the next step and jettison the neoclassical framework.