Fantasy Intellectual Teams: version 2.0

The goal of the Fantasy Intellectual Teams project is to improve public discourse by highlighting writers and podcasters who model high-quality discourse. Version 1.0 has been in progress since April 1. I am excited by the way that the initial buzz has shown that it is a viable method for working toward that lofty goal. We have also learned enough to be able to plan a much-improved version 2.0 for May 1.

The initial set of scoring categories, while a good start, will be improved. Because these categories are so important to achieving the goals of this project, we will develop new versions until we are satisfied. In that sense, the project is in Beta.

For those of you who are interested in the project–and I hope that many of you are–I plan to roll out version 2.0 in about a week, with a new draft around May 1. I encourage you to participate as a team owner by picking a team in the next draft. Leave a message in the comments if you are willing to play.

We will be having each owner pick a team of 7 intellectuals to follow. This is down from 15 in version 1.0, which was too much of a burden on owners. Also, as long as the project is in Beta, seasons will last just one month each. Longer seasons will be desirable once the scoring categories have stabilized.

As an owner, you have two responsibilities. One is to follow the team that you draft and submit claims when intellectuals on your team score points. Another is to provide feedback on improving the way that the project works, especially in defining the scoring categories. The experience of scoring your own team is a valuable basis for feedback.

Below are some initial ideas for scoring rules for version 2.0.

We are trying to find a balance between the seriousness of the goal and trying to create a fun game. The fun comes from owners competing by picking teams of public intellectuals and having the intellectuals score points during the season, in the way that individuals who are not in professional sports enjoy competing by picking fantasy teams.

As commissioner of the league, my decisions are final. But I value input from owners and from others, especially concerning problems that arise with the scoring categories.

A scoring category should be a quantifiable indicator of the qualities we want in an intellectual. Qualitatively, what we seek to do is reward those who:

–put forth interesting ideas for discussion
–try to persuade someone who might disagree, rather than simply rile up people who already agree
–demonstrate a willingness to face up to the weaknesses of one’s own position and the best points that can be made by someone on the other side
–make careful statements and tests them against evidence

A scoring category also should be specified precisely enough that most of the time it is clear whether or not a point has been earned.

The system for ranking teams, called Rotisserie scoring, ensures that each category matters. An owner cannot dominate a league by just picking intellectuals who pile up points in a single category.

Claims for points must be submitted to the commissioner (me). The claims must include links to supporting evidence and quotes of the relevant statements.

Here are some ideas for categories in version 2.0. The “meme” category of version 1.0 would be replaced by “discussion starter” in version 2.0. The other categories, “steel-manning” and “bets,” will be tweaked and clarified.

In what follows, when I say “you” I mean the intellectual on someone’s team.

(S) steel-manning. Undertake an extensive criticism of a point of view in which you attempt to articulate the strongest case for the view against which you are arguing. For example, Scott Alexander’s book review of Freddie deBoer.

If a player uses a disparaging label, it is almost surely not steel-manning. For example, saying that Kling is a “free-market fundamentalist” applies the disparaging label “fundamentalist” and is incompatible with steel-manning.

If someone wants to steel-man my views on markets vs. government, they would say something like “Kling’s case for markets rest on the view that government faces a knowledge problem and an incentive problem” and then show an understanding of what the knowledge problem means and what the incentive problem means.

(C) caveats. You earn a C for taking a skeptical look at your own point of view, pointing out the weaknesses in it. When I say that I am skeptical of monetarism, I can earn a C by pointing out that without a monetarist approach I have a difficult time explaining the U.S. inflation/disinflation of the 1970s-1980s. (That is if I were eligible to be drafted, which I am not). When Megan McArdle defends vaccine passports but points out many arguments against them, that is worth a C point.

The requirement for a C is not as stringent as the requirement for an S. You do not have to spell out in detail the strongest arguments against your point of view. It is possible that you could earn both a C and an S at the same time. Perhaps it will turn out that all S’s will also be C’s, but not the other way around. In baseball, all home runs are hits, but not the other way around.

(D) discussion-starter. This could be an idea that becomes a topic of discussion. You could put the idea forth in an essay, a blog post, a podcast, a tweet, or a book. Examples: Tyler Cowen’s “state-capacity libertarianism”; Larry Summers’ “secular stagnation”; Eric Weinstein’s “Intellectual Dark Web”; a Joseph Henrich book; a Bari Weiss essay.

The evidence for a D point, called Discussion Material, consists of essays or articles or podcast segments from a source other than the person for whom the point is to be awarded. The source must be respectable, not just an obscure person known only to the team owner.

The discussion-starter does not need not be a catchy phrase. But use of a specific catch-phrase can help to identify you as the discussion starter in the Discussion Material. Alternatively, if the Discussion Material links to one of your essays or blog posts or podcasts, then this shows that it applies to you.

Your idea may have first appeared prior to the season (Henrich’s WEIRD is eligible any time), but the discussion must take place during the season, as evidenced by the date(s) on the Discussion Material.

If the Discussion Material is written, then it must include at least 1500 words that pertain to the discussion-starter. If the discussion takes place in a podcast or YouTube video, then a segment at least 20 minutes long devoted to the discussion-starter is required. Combinations of audio and written Discussion Material can work (e.g. 10 minutes of audio from one source and 750 words of an essay from a different source).

It is not necessary for all of the Discussion Material to appear in a single place. Single-place Discussion Material could be a long book review or a symposium on the topic could. But the Discussion Material could consist of several different commentaries that do not necessarily speak to one another. But the Discussion Material must be more than just mere mentions–the discussion-starting idea must be the central point of the Discussion Material.

Most important, at least some of the praise for the discussion-starting idea must be non-tribal. If Nancy Maclean’s Democracy in Chains is only praised by progressives, then she does not earn a D. If the only praise in the material discussing a Victor Davis Hanson essay comes from conservatives, he does not earn a D.

When Alex Tabarrok’s “first doses first” is praised by Ezra Klein, that does help Alex earn a D. As long as other discussions of “first doses first” are sufficient to meet the 1500 word requirement, those discussions do not have to offer non-tribal praise. They could be either criticisms or rah-rah libertarian discussions. By praising Alex non-tribally, Ezra has suggested that Alex started a good discussion. Also, in Scott Alexander’s discussion of Freddy deBoer, the non-tribal praise that Scott offers would earn Freddy a D point.

It is the idea that determines tribal affiliation, not the person. If Glenn Greenwald writes an essay in favor of free speech and only conservatives praise it, then even though he is known as a progressive he would not earn a D point.

(B) Thinking in bets. To earn a B, a you must make a prediction with odds that could be translated into a bet. “I think that X is more likely to happen than Y” does not count. Saying that “X will happen with a probability of 0.2” does count. “I think that X has a fifty-fifty chance of being true” counts only if it is clear from the context that you are thinking in terms of a bet at exactly even odds that X is true.

If you give a probability range, the range has to be 10 percentage points or less to count. Thus, “I think that the Republicans have at least a 50 percent chance of re-taking the House” would not count, because the implied range is 50-100, which is too wide to formulate a bet. Saying that they have a chance of 50-59 percent narrows the range, and that could count as a B.

(O) open to reconsideration. You earn an O if during the season you change your mind on an issue or say that you are reconsidering a point of view, and explain why. Describing a change of mind that took place prior to the season (“I used to be for the Iraq war, but then I changed my mind”) does not count. Describing a change of mind that says you are even more right than you were before (“It turns out that Republicans are even worse than I thought”) does not count.

(R) research summary. You must formulate a specific question, such as “By how much does a higher minimum wage reduce employment?” and summarize the best research on either side of the question.

Undertaking a general critique of MMT would not count for an R. But taking a specific proposition, such as “inflation will only rise if the economy is at full employment” and summarizing the research bearing on that proposition, including the research that most strongly supports it and that which most strongly goes against it, would earn an R.

(P) pair-off. You engage in an extended, fair debate with someone on an issue where you strongly disagree. Andrew Yang and Ben Shapiro debating UBI in a podcast would count as a P. An email exchange that is reproduced in a blog post could count as a P. The issue should be one that can be formulated as a specific question, such as “Should the U.S. adopt a UBI?” If both participants are owned in the league, then each would be credited with a P. But if Shapiro were owned and Yang were not owned, Shapiro would still earn a P.

(L) Lucifer’s advocate. In order to probe another person’s point of view, a player during a podcast or email interview plays Devil’s Advocate. You hear Russ Roberts do this on some of his podcasts, for example.

[updates]

1. A clear example of an L can be seen in the questions asked (in bold face) by James Pethokoukis when he interviews Scott Lincicome. Under version 1.0, I think Pethokoukis earns an S, but under version 2.0 it is an L, not an S.

2. What to do about a bet splat, which I would define as a long list consisting of a proposition and your adds. See Scott Alexander. I think that to qualify for “thinking in bets,” you have to show what you are thinking. In the linked post, Scott is thinking about the implications of his average overprediction, and that counts for thinking. But I would only give one B for that. When he makes a new list of predictions, he can get a B for every prediction for which he gives at least a two-sentence explanation for why he gives the number he does rather than some other number.

Note that under Rotisserie scoring, if Scott dominates the B scoring, and consequently his owner dominates the B scoring, it makes no difference whether his owner gets 100 more B’s than the next owner or 1 more B than the next owner. The impact on the overall standings is the same.

18 thoughts on “Fantasy Intellectual Teams: version 2.0

      • I too was thinking about the C category. One often hears/reads a tin-man masquerading as a caveat. That is, a weak version of the caveated position is introduced merely to vanquish it and, thus, strengthen one’s original position. This gives the appearance of even-handedness but is really just a manipulation. Not sure how to adjudicate on this, but it is something that comes up enough to be considered (and incorporated if feasible) in the scoring.

        Ps, I very much commend the spirit of this project, and I hope it has the intended effect for the players, referee, commenters, and silent observers.

  1. One way to look at ways to improve FIT versions is to think about it like Moneyball. The people you want to score high – thus hopefully gain higher status somehow – are those who would have the higher “win shares” in terms of being the kind of people who would, if they had higher status and were more influential, best contribute to the solution of “the real problem” we face. Disagreements on how to precisely define that problem would yield different tailored FIT approaches.

    So, one way to test FIT versions is to pick a public intellectual with views one hopes become higher status, and to see whether your FIT-scoring system would tend to capture their style.

    FIT2 to me lacks two essential features: Good Defense (fair criticism and counter-criticism) and Bad defense (fouls, technical and personal).

    For instance, Robin Hanson does a lot of Good Defense, and his latest, “Response to Suri Re Futarchy” is exactly what you want to see a good public intellectual do, with gentle and civil but still powerful deflections of everything Suri threw at him.

    How does doing that measure up in SCDBORPL? (For you Pabst Blue Ribbon fans, I’ll use “COLD-PBRS”)

    C – low
    O – low
    L – low
    D – low
    P – low
    B – low at ground level, high at meta level?
    R – high formally and previously, but low-ish here – only by reference or brief mention
    S – medium? – a fair restatement of Suri’s points, but only for the purpose of knocking them down.

    So, overall FIT2 COLD-PBRS scores from a blog post like that is low. And yet it should be very high, this is exactly what I want to see public intellectuals with low-status ideas do to their critics, which is to show how easily they can bat away the objections and knock them out of the park.

    Or look at Steve Sailer, the most unjustly maligned blogger-essayist public intellectual of our time. No one is consistently better at “punching up” against the increasingly brazen absurdities in mass media and demonstrating incoherence briefly and in a common-sensical, witty way – precisely what we we should want public intellectuals to do more of, and to get more status for doing. Two recent ones are good examples of this are “Biden’s Reparations Printer Goes Brrrrrrrrr” and “NYT: More Than 33 Million Americans Lack Enough to Eat” (an unfortunate stain on DeParle’s career).

    For the latter, the list of synonyms is a hilarious way to reinforce the message that these are not words one could fairly use to describe the poorest Americans, who are not anywhere near the edge of starvation.

    For the former article, the incoherence speaks for itself, but it helps when Steve spells it out: “Wait a minute, I thought the problem was that infrastructure was built in black neighborhoods. Now the problem is it wasn’t built. Which is it?” And in response to the line in the article, “Past projects were often built in communities that did not have the political capital or resources to successfully protest.” Steve pointed out, “But your expert just said: ‘A lot of previous government investment in infrastructure purposely excluded these communities’.”

    Again, that’s what we want people to do. To “keep them honest”, the incentive for mass media should deter this kind of writing because they fear that public intellectuals with status and influence will pillory them when they engage in such sloppy propagandizing.

    But that would score low across the COLD-PBRS board.

    • I like your category about Defense, as some typically thoughtful and measured people lose their minds once criticized. One potential concern (if it’s even a concern at all) is that players who score well across the other categories may not elicit as much criticism as players who score less well.

      Indeed, criticism seems to be more readily delivered to people deemed boorish and/or who espouse specific “objectionable” views. Hell, people often behavior boorishly and/or make inflammatory comments to specifically elicit criticism (ie, attention).

      I haven’t thought this through so it may not even be a problem; it was just my free association to your comment. Regardless, I would love to see “Good Defense” raised in status and thus emulated.

  2. handle–

    I don’t think I buy the defense of Sailerism here. What you are essentially describing is a public gadfly, one whose bites are made the more delectable by their humor and wit. But there are dozens of people like this on the social justice left already, and they would score as highly on any objective metric that measures this sort of thing as Sailer does. In a way I think it is partially responsible for the intellectual world we live in now, and it is for fear of being pilloried by them the mass media writes the way it does. You’ll object that these figures leading the zeitgeist instead of rebelling against it, but that has not always been true, and in any case is more of a “object” level objection than a “meta” one. As I see it we already have a media environment where pillorying the mass media is a fast ticket to success; it just happens that American intellectual life is not especially sympathetic to your favorite pillorer.

    • Tanner,

      As a fellow fan-boy of Sailer, let me jump in to his defense. I don’t think you are describing our current media environment fairly or how the best Sailer posts respond to this environment. It is clear to anyone with eyes to see (or as Sailer would put it — “and yet it moves!”) that our mass media is full of left-wing crazies who distort reality and push what Sailer calls “the narrative.” Some of his intellectual effort is put into showing that the emperor has no clothes…the social justice left spends their time defending the emperor’s nudity.

    • Nah, that’s an error of over-generalization. You may want to go meta, but the object level doesn’t justify it. Just like in group theory, you can’t go up a level of abstraction without symmetry because of the irreducible complexity of the granular details, and in this case we clearly have the very opposite of symmetry. Who do we fear? Who can ruin our day, or our lives? The Sailers of the world? No way. Weight class matters.

      Some of the gadflies are doing good work and would be more beneficial to the health of our intellectual life if they had more influence, while the others are both nasty and low-quality and are made all the more terrible and deleterious by their intimidating influence. I am likely more favorably disposed to good gadflies from either side than you are, and again, they are not the problem. The problem is not criticism in the abstract, the fear of which is good and healthy, but terrible critics asymmetrically having the power of terror and intimidation at their disposal.

      Not saying it about you or Arnold, but I don’t think you can deny that there are a lot of commentators out there who are far too eager to intentionally and inappropriately jump up one too many levels of generality precisely to avoid mentioning the elephant in the room of who is and isn’t able to deploy the social destruction machine and thus pissing off the left, and so play the “pox on both your houses” / “they are equally bad” game of enlightened transcendence and being above the petty fray.

      Anyway, it’s a big topic. Let’s keep it going.

      Like any group of boys trying to invent a new game, there are going to be multiple rounds of negotiation about revising and refining the rules.

      Now, how can one tell the difference between two alternative rules without coming to any kind of consensus on what makes the game ‘better’?

      In sports, biology has provided the answer in common mental firmware that enables the social psychology necessary for spontaneous coordination under tacit, intuitive assumptions that don’t have to be made explicit.

      All the boys instinctively want the opportunity to show off, for there to be a way to establish a ranking and pecking order, for the best to usually come out on top, for the level of performance to climb to human potential for excellence, for the game to be fair, for most situations to have a clear, objective call (“law over governance”), and to maximize ‘entertainment value drama’ and the tension of excitement derived from the right mix of luck and prowess.

      The best, most entertaining championship games are not humiliating blow-outs but instead those that seem “neck and neck” from beginning to end, and, without having to go to the trouble making the goal well-defined, common instincts help to generate agreements about the rules that tend to maximize those kinds of contests.

      But there isn’t one perfect sport for everybody, because there is no equivalent consensus on questions such as roughness of physical contact or speed or running or amount of scoring or particular athletic skill-sets, etc. Still, the rules for all those different setups tend to evolve in the way I described above.

      But in this game to argue productively over rules we would need to explicitly define what we are trying to maximize for, which would in turn require agreement on the whole diagnosis of what has gone wrong.

  3. I’m a scientist, so data is my primary currency. So I’m very much predisposed in favor of the R category. This category sounds relatively straightforward on its face, but I don’t think it’ll be that easy to adjudicate.

    I’m curious what principles Arnold will be guided by to determine “the best research on either side of the question.” To take a recent example, a few weeks ago there was a post highlighting data from Eric Kaufmann on “Academic Intimidation.” This may very well be “the best” data we have on the topic (I’m not so sure), but it still fell far short of what I would consider rigorous scholarship.

    Among other problems, the samples were far from representative, many of the questions were loaded, and there was a clear ideological motivation to the work. It also lacked peer review (to be sure, I don’t fetishize traditional peer review and am well aware of its pitfalls).

    I say all of this NOT to disparage the project or Eric Kaufmann – indeed, in many ways, I commend him for putting his money where his mouth is (so to speak). Nor am I saying that his conclusions are wrong – rather, I’m saying that many of his research methods were subpar.

    I bring this up to highlight a challenge inherent to the R category of FITs and to hopefully here from Arnold (and/or others) how he will approach this. What if the “best” (ie, only) research on a given matter obscures rather than enlightens?

    • I think that the best R points would be earned by someone like Scott A., who looks at studies on both sides of a question and comments on methodological weaknesses in those studies and tries to explain why they come to different conclusions.

      • If I am understanding this, you (as referee) will not so much be focused on reviewing the quality of the presented research. Rather, you will focus on whether players “show their work” by presenting (summaries of) methodological details of the various studies and articulating how these details inform for the topic (and their thinking on it).

        I think this is more workable than what I initially had been imagining wherein you were judging the quality of the research. I don’t doubt your ability as a judge per se, rather I question whether you would have the bandwidth to be said judge for this project.

  4. I would love to see Arnold (or others) apply the FIT scoring criteria to commenters.

    • Comments are too restrictive and short to allow for adequate opportunity to score points along all the dimensions. Discussion material and Pairing are especially tough, and research summary close. Still, one could try to assess people in terms of impressions of ranking their strengths and weaknesses.

      If I were to try and rank factors for scoring potential for my comments (not just here, but elsewhere) I’d guess LBSCORPD. I still think P needs work, people with worthy but low status ideas are going to find it hard to match and find pair partners. When Rogan went to Spotify, they took a few dozen of his old shows out of their collection. Lets everybody else know to stay away from those guys too.

      • I agree that comments are too restrictive and short to apply the entire scoring system. They might lend themselves better to a single scoring category – offenses (or penalties). Instead of adding up all the positive points scored, one could simply add up all the offenses committed.

        What this scoring approach lacks in reinforcing good behavior, it makes up for in ease of implementation.

  5. Another thing to consider about Sailer is that he used to write a lot more data-driven and intellectually respectable long-form journalism and essays back in the old NR days before Buckley stood athwart history, lost his nerve, and said “fine, go ahead”. “Is love colorblind?” is a good example, but 24 years ago is a foreign country.

    Public intellectual life is being “in the arena”. Just as in war, the optimal way to fight depends on the power imbalance.

  6. I’d like to sign up.

    I’ve read through these rules. I believe I understand what is expected. I have various intellectuals in mind that I’d like to draft, and it will give me an excuse to read them more.

    • Great! I’ll add you to the owners’ list. We are going to have some owners’ meetings on Zoom starting this Monday, in order to brainstorm version 2.0

Comments are closed.