Why does the Google News algorithm lean left?

Nicholas Diakopoulos writes,

Our data shows that 62.4 percent of article impressions were from sources rated by that research as left-leaning, whereas 11.3 percent were from sources rated as right-leaning. 26.3 percent of impressions were from news sources that didn’t have ratings. But even if that last set of unknown impressions happened to be right-leaning, the trend would still be clear: A higher proportion of left-leaning sources appear in Top Stories [on Google].

This reinforces my own impression. But I don’t think that the Google News algorithm is constructed in a sinister way. Suppose that the algorithm is designed to put at the top the stories that users are most likely to click on. To the extent that Google’s users tend to prefer left-leaning news sources, this will lead the algorithm to highlight those sources.

Moreover, the news outlets themselves are driven to appeal to a progressive audience. Progressives want the WaPo to give them news with a slant that makes Trump’s impeachment seem imminent, an the WaPo obliges.

In short, I suspect that the reason Google News promotes so much left-leaning outrage porn is that a lot of people want it.

22 thoughts on “Why does the Google News algorithm lean left?

  1. The bigger issue is the handing over of many ‘decisions’ to complex, dynamic algorithms which are, effectively, completely inscrutable black boxes. This is especially so for ‘big data’ / ‘AI’ / ‘machine learning’-based algorithms, which rely on massive, constantly changing databases which only a few major companies are able to access.

    Any organization can always deny these suspicions and accusations by saying that some decision was simply the result of ‘just the algorithm’ instead of a human putting a thumb on the scale according to his or his own preferences or agenda, and there is usually no way to prove otherwise.

    The left has already started to objecting to use of such algorithms for decisions on investigation, prosecution, and sentencing, because, predictably, the computers produce disparate impact. Their suspicion is that the programmers were racists – either with self-awareness of being subconscious and implicit – and wrote the software to produce racist outcomes. So the software must be ‘recalibrated’ to produce the ‘correct’ outcomes, with the right racial quotas.

    Imagine the left’s reaction to using the same argument against Google or Facebook. “Well, your news algorithm was written by progressives with either intentional or subconscious bias, but with the effect of producing improperly biased presentations of news coverage. So, your algorithms must be recalibrated to show no less than 40% conservative sources in your first page of linked-headlines.”

    • The bigger issue is the handing over of many ‘decisions’ to complex, dynamic algorithms which are, effectively, completely inscrutable black boxes.

      This is a big issue, but a different one. “Liberal bias in the media” was a staple talking point years and years ago when AI/deep learning in its modern form wasn’t used and didn’t even exist. As for “algorithms” (DL AI aren’t really algorithms — an algorithm is by definition not a black box!) as black boxes, doesn’t their black-boxiness differ only in degree from the black-boxiness of, say, the IPCC process or of the intelligence community? It is perhaps somewhat more feasible to fish out information out of the latter’s processes with FOIA requests, public data scraping and whatever than it is to fish out decision paths from a network, but in practice in both cases the effort required is too large to be practicable.

  2. I suspect that the lion’s share of news articles are slanted to the left so that a completely random selection would be as well.

    • Agreed. Marginal Revolution posted about hockey-stick explosion at the NYT [https://marginalrevolution.com/marginalrevolution/2019/06/the-nytimes-is-woke.html] in what would be called oppressor-oppressed language in Kling’s Three-Axes Model. That’s the language that progressives use to appeal to each other. Thus, hockey-stick growth in oppressor-oppressed language is likely hockey-stick growth in articles written by progressives to appeal to progressives.

      Similar searches for liberty-coercion (libertarian) language — choice, consumer, regulation, freedom, state, government, tax —and civilization-barbarism (conservative) language — family, values, moral, foreign, invader, tradition, Christian — reveal no similar hockey-stick pattern.

      (Political) climate change at the NYT is real and progressive-made.

  3. From the linked piece: “The ratings [of outlets as liberal/conservative] don’t measure the slant of the media outlet per se, but rather reflect the self-reported political affiliation of Facebook users sharing content from those sources.”

    Seems like a problem for this interpretation. The actual story here seems to be that mainstream news outlets are more commonly linked to on Facebook by liberals than by conservatives.

  4. This is simply not correct.

    Almost no one worries about the perspective of the programmers. That part is documented in the code and is easy enough to monitor.

    AI is designed to react to pattern recognition in data. That is what separates it from normal programming, where most of the logic (or bias) is to be found in the code. The data dictates the behavior. Often, that data is collected about human activities, and it is possible for AI to detect various forms of bias and amplify it.

    More sophisticated AI will inevitably produce unexpected patterns of decisions because the data dynamics are complex. That’s the whole point.

    The big problem is autonomous decision making without checks and balances, not AI per se. We wouldn’t like it if a person was making these decisions either, if they were completely opaque and unaccountable for their decisions.

    • Data doesn’t descend all pure and ready to use from heaven. Selection, preprocessing and attributing of training/learning data, which is what enables AI to do useful work, is done by humans. AI training process then learns patterns in data corresponding to given attributes. Undirected learning — where AI establishes categories of attributes on its own — is not used very often (if at all). This approach makes claims of potential bias in data preparation plausible, even though I doubt that anybody actually does that. The left’s consensus is by definition to the left of reality, ergo observations of the latter will have a right-wing “bias” relative to the former.

  5. Left-leaning compared to what? Google News reflects the content of the mainstream news sites, it does not produce news itself. And as Moldbug explained over a decade ago, there is no such thing as liberal media bias (his italics):

    The idea that the “mainstream media” suffers from “liberal bias” is very typical of the conservative pathology. …

    [A]nyone with an IQ over 80 knows that the “mainstream media” is in fact the US’s official press. No, the New York Times and CNN are not formal government agencies, like the BBC or Tass. Technically, journalists are corporate serfs like the rest of us. But in fact, as Walter Lippmann pointed out in 1922, who controls public opinion has the nation by its balls, and woe unto any mere CEO who dares to screw with them. …

    From the conservative perspective, this is hardly the end of the world. The problem is just that almost everyone who works for the Department of Information is, for some reason, a liberal. So the solution is just to have conservative voices in the media. Or even a new conservative media—Fox News, the Washington Times, the New York Sun. After all, just as many Americans are conservatives as liberals, so they should have a right to get the story from people they agree with.

    This is exactly why conservatives keep losing. They see their problems as solvable. True, they never seem to get solved. But it can’t hurt to keep trying, can it? Well, actually, it can, because it maintains the illusion that the game is competitive.

    To see why it’s not, it’s interesting to look at progressive views of the same problem. From the[ir] perspective … the media is actually hopelessly conservative. And when you read them you understand why. It is conservative because it is more conservative than they are, and since they are right, this can only be explained by insidious media monopolies, which have reduced the supposedly independent, courageous crusading journalist to a mere corporate shill.

    … In the English language as we use it now, words like “progressive” and “conservative” are actually relative designations. “Progressive” means “left of the mainstream” and “conservative” means “right of the mainstream.” When the mainstream shifts, these words have to shift as well, and the result is that many of the radical left-wing ideas of 1907 would be radical right-wing ideas in 2007.

    So when we say the “mainstream media” has a “liberal bias,” what we’re actually saying is that it’s to the left of itself. This claim is obviously false, and Alterman and company are on perfectly safe ground in ridiculing it.

    And this is why conservatives fail. They fail because the real problem is much too large to solve with any of the tools at their disposal. So instead, they invent a fake problem, which is unsolvable because it doesn’t exist. This gives conservatives something to do, and it gives people who don’t like the system someone to vote for.

    But the whole conservative movement serves the same purpose as the toy opposition parties of East Germany—or, again, the Washington Generals. Of course, it works better than either of these entities, because it actually does its job. It convinces Americans that their government is the product of a competitive, adversarial process.

    The real problem is that US public opinion is managed by a cradle-to-grave information system—the apex of what I call the Polygon—consisting of the media, schools and universities. … Furthermore, there is good reason to believe that a stable democracy cannot exist without such an information system, because democracies in which different groups of voters have different versions of reality tend to be rather violent. In fact, since the Internet is starting to route around the Polygon, both to the left and to the right, such pleasures may await us as well (yeah, this sounds familiar – C.) As Lenin put it, you may not be interested in war, but war is interested in you.

    Conservatism, at least as presently constituted, is about as capable of solving this problem as Jar-Jar Binks is of defeating Emperor Palpatine. And this is why I’m not a conservative.

    • Perhaps. But the polygon of information is in the process of falling down. And conservatism works better than progressivism. So buckle up.

    • From a conservative point of view, we just assume the media is lying to us.
      Because they are.
      Any honest person “with an IQ over 70” knows this, but it doesn’t serve the cockroaches on the left to talk about it in public, and in fact obfuscating the fact is part and parcel to their strategy.

    • “Moldbug” is clearly incorrect. First of all, when people discuss media bias, they are generally referring to the media relative to the general population. He seems to define the notion of media bias in a manner renders the claim that the mainstream media is biased false by definition (the mainstream can’t be to the left of itself). That’s fine if he likes that definition, but then this is just a semantic debate over the meaning of ‘media bias,’ which is pointless anyway.

      Moreover, the media is clearly not a de facto organ of the state. It has its own subculture, its own values, its own agenda, which is sometimes concurrent with that of the state, but also often works against it. The simplistic model that Chomsky promoted where the media simply regurgitates what the state wants it to wasn’t even quite right in the 80s, but by today, the media is less dependent on the state for information than ever (in part because what information is considered ‘socially relevant’ has expanded well outside of what goes on in the halls of government), and the corporate media is less useful to the state than before because it can no long control the flow of information to the public as it used to be able to. In short, the symbiotic relationship between state and corporate media that Chomsky’s model was based on has waned.

  6. From the original ACM.org article:

    To minimize personalization, automated searches were made using a desktop browser configured with no user history, without being logged-in, and with language set to English. One remaining source of personalization could have come from server location, in this case Ohio. However, previous work shows that location personalization impacts mostly localized services (such as “airport” and “pizza”), and has a significantly smaller impact on more general terms, such as controversial topics and names of politicians [47].

    Ummmm…. well, I’m Canadian and I the news sources I see from Google are almost always Canadian. Yet I never see French Canadian language sources so language is another key factor. I see Canadian sources from an Incognito/Private browser window as well so IP address geo-location seems to be used. The diversity of Canadian news sources seems to be very broad including many news radio sites which I’d never read without Google fronting them. What is consistent with all these sites is that they are dedicated to breaking news, either print, radio, television, or pure online. Google is using some heuristic to classify a site as “breaking-news-site”.

    The linked article and original paper (same authors) starts off with the assumption that diversity is an important factor. From my perspective, the key elements are the classification algorithm (i.e. what articles from different sources refer to the same story), the national geography, the spoken/written language, and what sites/domains are considered sources of “breaking-news”. Google even provides a “full coverage” link when more than one source covers a story.

    The only diversity questions are how Google selects the 3 or 10 stories that are shown in the Top Stories carousel, and when a story has multiple sources how Google determines which source to front on the search results or news.google.com. I don’t think the methodology used in the paper addresses diversity in a meaningful way.

    When CNN and NYTimes account for 17% of American fronted news stories it is somewhat circular to point out that coverage is left-leaning. I’d be surprised if Google did some kind of sentiment/ideology analysis on these articles and I don’t see the promotion of editorial/opinion articles. I’m skeptical that the fronting algorithm is anything other than a breaking-news-site-rank applied per national geography/language. The interesting algorithm is story classification and Google seems to have mastered this (i.e. all the articles listed in “full coverage” belong together).

    The only semi-real controversy with respect to news bias, in my opinion, is Google’s strong preference for stories that use Google’s AMP (accelerated mobile pages) on mobile devices.

  7. How do we now the searches are not for something gone wring?
    When something goes wrong in the economy, I generally search the progressive sites for something Krugman did.

  8. “To the extent that Google’s users tend to prefer left-leaning news sources”

    Such bullshit. Most people use google, i.e., there is no clear majority preference for left leaning news.

    Google’s algorithm is left-leaning because the designers are left leaning. Their left-leaning assumptions are BUILT IN. It doesn’t have to be sinister. It could just be standard bias. But we KNOW it is not. We know this from google’s actions to silence right leaning sources by deplatform right leaning youtubers by actually labelling right wing sources as “extreme” and “dangerous”.

    The benefit of the doubt is over, Kling. The evidence is in, but in true faith and allegiance to your tribe, you’re pretending it’s all just giving people what they want.

  9. Mr. Kling, you may be brilliant but in this case you are wrong:

    sinister (adj.)
    early 15c., “prompted by malice or ill-will, intending to mislead,” from Old French senestre, sinistre “contrary, false; unfavorable; to the left” (14c.), from Latin sinister “left, on the left side” (opposite of dexter), of uncertain origin. (https://www.etymonline.com/word/sinister)

    The algorithm is totally constructed in a ‘sinister’ way.

  10. Google has a well deserved reputation for having strong left wing political views and for using their technical platform to advance their political views in ways that could arguably be called “sinister”.

    I would consider it a safe presumption that Google tries to advance their political views through their news platform like they do with their other technologies.

    Kling takes the opposite point of view, and gives Google this undeserved benefit of the doubt that they are trying to be perfectly politically neutral with news, which is unwarranted.

    Google finances efforts to improve news quality. They cite NYT as a model news organization. I consider it a very safe presumption, that Google is steering news in a direction aligned with their political viewpoints. Previously, Google worked to fight “fake news”, but when Trump took over the phrase “fake news” in early 2017, Google dropped that and started using different terminology.

    • I agree with this take. In order for me to give Google the benefit of the doubt they would have to demonstrate a general lack of partisanship and bias in their behavior across the board. That’s not what we see when Google executives are literally on stage crying at the results of 2016 political elections, or stating that they will redouble their efforts in subsequent elections.

      It was two years ago that James Damore was fired for writing an internal, non-publicized memo that went into great detail to try and specify ways to be more inclusive of women in Engineering positions based on reliable scientific data. This is not an environment where happenstance is acceptable if it deviates from the Leftist party line.

  11. “Google searches lean left because left-leaning clickers don’t click Right” is very likely true. But in Google’s “virtual monopoly” context, this jackleg tropism, however inadvertent, amounts to censorship– a radical de-emphasis not just on content but on searchable topic, matters of great pith-and-moment your typical grunt-stuff lefty prefers to leave alone. Absent context and perspective, nothing acquired in isolation has any significance whatever.

    In principle, we’d fragment the Google Monster simply to ensure that various profit-driven algorithms do not engender a socio-cultural monocult of any nature– particularly the current doofus, parlous Left’s Rejection Front of common-sense approaches to all naturally occurring, humanely “organic” growth-and-change vs. sanctimonious cultists’ chiliastic Kool Aid Kit.

  12. There most likely are steps in Google’s algorithms that allow for (or require) editorial decision-making (e.g., choice of, or weighting of, keywords that influence placement or suppression of webpages, etc.) and Google, having a very far left workforce, likely does tilt left at those points. However, such points of human influence don’t seem likely to be the major factor. The bias seems overwhelmingly “organic:” most of the prominent media outlets lean left; leftists likely read more articles on average than non-leftists; they may be more responsive to politically charged language, so advertisers may genuinely find them a disproportionally lucrative target. For cultural/institutional reasons, even big name right of center writers – George Will, Greg Mankiw, etc. – publish as much or more in left of center publications than right of center ones, meaning more non-leftists click on leftist sites than non-rightists do on rightist sites. Finally, leftists are almost certainly more likely to report pages or sites that offend them than non-leftists are.

    It’s tempting to attribute the skewed results to a straightforward conspiracy like Google putting its thumb on the scale; it allows for an easy fix, it leaves open the possibility of sympathetic masses eager for an alternative, but I think the reality is that the problem is more cultural and more deeply rooted, and less surmountable. On the producer side: the universities you need to go to to become a nationally relevant journalist are overwhelmingly leftist, and the journalism departments especially so; left-leaning students are more likely to choose journalism as a field; the major employers are already left-leaning, and therefore tend to preferentially hire and promote left-leaning people; even the regions and cities where the employers are located are tailored to progressive values and priorities. Every step of the process selects for left of center people. And on the consumer side of things: people on the left just seem to care more about the political slant of the material agreeing with them than non-leftists, and are more likely to discriminate intensely based on the politics of the media provider. This factor may be rooted in the differences in innate personality types that tend toward progressivism vs. conservatism. It may not be possible to get the kind of people who tend to be conservatives or libertarians to rally behind non-leftist media outlets (and against leftist media outlets) with enough force and in great enough numbers to counteract the leftist momentum.

  13. A bit of clarification.

    …a lot of people want it.

    Well, a lot of people who are going to click through the Google News links. Whereas most people who lean right know we have to actively search out “news” stories written by people who do not actively hate us.

    So, I think you’re generally right. The algorithm is likely not actively picking Left-leaning sources. But is just giving the people who tend to click through the Scooby-Snack they are after.

    But, I guess the question is, how did those sources, and not a more balanced slate, end up being there in the first place.

Comments are closed.