The Original Internet Architecture

Tyler Cowen writes,

It remains the case that the most significant voluntary censorship issues occur every day in mainstream non-internet society, including what gets on TV, which books are promoted by major publishers, who can rent out the best physical venues, and what gets taught at Harvard or for that matter in high school. In all of these areas, universal intellectual service was never a relevant ideal to begin with

The original Internet architecture was “smart ends, dumb network.” The smart ends are the computers where people compose and read messages. The “dumb network” is the collection of lines and routers that transmits the bits.

Suppose you create a message, such as an email, a blog post, or a video. When your computer sends the message, it gets broken into packets. Each packet is very small. It has a little bit of content and an address telling where it is going. The Internet’s routers read the address on the packet and forward it along. In Ed Krol’s metaphor, the Internet routers and communication lines act like the Pony Express, relaying the packet to its final destination, without opening it up to see what is inside. The dumb network transmits these packets without knowing anything about what is in them. It does not know whether the packet is an entire very short email or a tiny part of a video.

When your computer receives a message, it consists of one or more packets–usually more than one. The computer opens up the packets and figures out how to put them together to form the message. It then presents you with the email, the blog post, the video, or what have you.

A connection between one end and the other end stays open only long enough to send and receive each packet. To transmit any given message, I may receive many packets from you, but those packets could come over different paths of the network, and thus each packet uses a different end-to-end connection. Think of end-to-end connections as being intermittent rather than persistent.

Some consequences of this “smart ends, dumb network” architecture:

1. The network cannot identify spam. It does not even know that a packet is part of an email message–if it did, spam could be deterred by charging email senders a few cents for each email unless the recipient waives the charge.

2. The network does not know when it is sending packets that will be re-assembled into offensive content. Otherwise, it would be easier to implement censorship.

3. The network does not know the identity of the sender of the packets or the priority attached to them. In that sense, it is inherently “neutral.” The network does not know the difference between a life-or-death message and a cat video.

I get the sense that this original architectural model may no longer describe the current Internet.

–When content is cached on the network or stored in the “cloud,” it feels as if the network is no longer ignorant about content.

–Many features, such as predictive typing in a Google search, are designed to mimic a persistent connection between one end and the other.

–When I use Gmail, a lot of the software processing is done by Google’s computers. That blurs the distinction between the network and the endpoints. Google is performing some of each function. Other major platforms, such as Facebook, also appear to blur this distinction.

The new Internet has advantages in terms of speed and convenience for users. But there are some potential choke points that did not exist with the original architecture.

10 thoughts on “The Original Internet Architecture

  1. I don’t think that a whole lot has changed technologically. The sites that have been banned in the last few weeks could have been banned ten years ago. What changed is the norms.

    In order to publish content on the internet you need to lease an IP address and a domain name from a big company. The IP address connects you to the network and allows people to find your server and download the content hosted on it. The domain name is an alias to look up the IP address, so that you can move servers but people can still find you. Up until now, the big companies would only revoke your domain registration or IP address for doing something obviously illegal like hosting kiddie porn or copyrighted Hollywood movies. Now, big companies are revoking domain registration for hosting “hate speech.” The network itself is still agnostic to content during transit, it’s still mostly dumb pipes. But if you are publishing stuff to be readable by the general public, the person in charge of the big company can read it to, and say, “we don’t want to lease a domain name to this hateful organization, we’re rescinding their account.”

    The only big change technologically since 10 years ago is the rise of content delivery networks. It used to be that the end user would load a site directly from the publishers servers. But the problem is that your typical publisher cannot afford to run a beefy server. So if their site has a spike of popularity, or comes under attack by armies of bots and is held ransom, their site goes down. So the solution is that you get a big company service to cache your site, to keep a copy of the pages, and they have the beefy servers that can serve the content and survive attacks.

    So for practical purposes, to publish content on the web you need three things: 1) to lease an IP address 2) to lease a domain name 3) to have a content delivery network be the middle man caching your content. Any one of these three thus becomes a bottle neck that prevent you from publishing publicly on the web.

  2. The other thing I should add is that there has been an increasing amount of consolidation in terms of how people actually find web sites and access the internet. People increasingly rely on centralized services:

    – Google searches to find sites
    – Twitter/Facebook/Youtube to host, find and share content
    – Google or Apple stores to install mobile apps
    – Chrome/IE browser, which now actively block malicious sites (based on domain name)
    – GMail, which has spam filters to block sites based on IP address or domain name
    – Squarespace/Wordpress.com to publish content

    The above services at least can all be routed around. If they block you, there are alternatives to getting your message out (unlike if your IP and domain are blocked in which case you are kicked off the internet). You can install apps on your phone directly, without using a centralized service. You can use an open source browser or email server that won’t block sites Google deems malicious. But it is a pain in the neck to do so. So for practical purposes, if you are blocked by these centralized providers, you are going to find it much harder to reach 99% of the internet using population.

    • Most people had always used centralized services to host, share and publish content. AOL, Angelfire, Geocities, Lycos, etc. were the big names at that time. The number of personally hosted sites has been insignificant except for the first few years of WWW’s existence. Email used to be provided by ISPs and operated with email clients; if your account got complained about and your ISP closed it, you had to switch ISPs. There were centralized search companies before Google, too. I distinctly remember using AstaVista – I believe it was before Yahoo search got big. If you’re looking for changes that might be called qualitative, it is the per-capita volume and breadth of queries passing through the centralized search engines that has exploded. People didn’t use to ask Google how to meet girls, where to buy pizza, what’s the fashionable color of sneakers this season or what to do if the bleeping computer is too sluggish.

  3. I would use a different way to distill the features of internet architecture that are the most relevant for the issues of freedom and censorship. Specifically, I would say that there are two components to what we understand as the traditional freedom of the internet:

    1. The network as a content-neutral service for sending data between computers. Any online computer can contact any other and place a request for data, and the remote destination computer is able to send (or not) any data in response. The phone network, although it works very differently, is not a completely bad analogy: anyone gets a phone number and is free to call anyone else, and if they decide to pick up, they’re free to say whatever they want to each other.

    2. The machines connected to the network are general-purpose computers under full end-user control. The end-users can use them to store whatever files they want, to send out these files on request (if they want to run a server), and — crucially — to request, fetch, and display any files that anyone else has chosen to make available. That’s what the Web basically is.

    Internet censorship can be aimed at either of these:

    1. ISPs and providers of other essential network services can’t (practically and scalably) monitor and censor the data you send and receive, but they can cut off your access completely. This has been true from the beginning, and in this sense, the “routing around damage” freedom has always been illusory. If you’ve incurred the animosity of all available service providers (and it’s a high economy of scale business that will never have enormous numbers of players), or if they’ve received a government order to cut you off, there’s nothing you can do. (Things have become worse only in that it takes more complex and numerous services to run a sever now than in the past, any one of which can cut you off.)

    2. In the second respect, things have become much worse, in that we’re moving away from the model where people can use their devices as general-purpose computers. In the new model, which is already near reality for mobile devices, the user can only choose between a number of “apps” pre-approved by a big gatekeeper (i.e. Google or Apple), and to send and receive data and display content only insofar as these official apps will allow. The sort of low-level control over the computer’s programming and storage that still exists on an PC, and that used to enable the free web, is gone. (Sure, advanced users can jailbreak the device and do whatever they want, but it’s the options given to the average user that really matter.)

    The only loophole left is the legacy web browser. Which is a curious relic nowadays: my iPhone won’t even let me access its own locally stored files, but I can use the browser to retrieve and display files freely from a server a thousand miles away! Presently too much still depends on this legacy loophole, so it’s likely to remain supported for the time being, but in the long run, I can see the Web going the way of Usenet — still available in principle, but obsolete, obscure, disreputable, and ignored by the average user.

    Arnold wrote in a recent post that for a mobile device user, Google and Apple own the internet. This is still not completely true, because they can’t control what one accesses through the browser. However, I’m not optimistic about the future of this exception, as there seem to be various strong incentives in favor of the centrally controlled “app” model.

    • Vladimir, you are about to get filesystem access on your iPhone, in a little under two weeks.

  4. I would like to thank the previous commentors for their contribution of relevant technical knowledge to this thread. I think one has to combine that information with the news reports of heavy control over internet and social media communication in other countries, most infamously China, but Turkish, Egyptian, and Pakistani measures have been in the press lately, and Venezuela recently announced controls on “hate” that seem to be a naked pretext for suppression of any opposition to effectively create a kind of neo-lese majeste regime.

    Overall, I think the general lesson is that the predominant means of communication and the whole environment is continuing to evolve in the direction of a “Gatekeeper-Policed Internet”, where the state, or a few entities with close relationships to the state, can effectively determine who does and does not get a platform to publish content in a form where the general public can easily get to it.

  5. Dumb core, smart edges is a technical feasibility issue. You can only ‘fit’ so much processing at the nexus. When you look inside Facebook you see this architecture on the inside.

    Facebook has a large campus in Menlo Park. The entire campus would fit in one data center building. There are ~100 of these. Facebook will spend 63B over the next five years on their data center infrastructure .

    This is what it takes to play man in the middle.

    • Aggregate figures cause loss of context. On a per-user basis, that may be only $1 a month. Last year, the NYT reported that the average user spends about 50 minutes a day on Facebook, so the costs you mention come to about 4 cents per hour. Any attempt to do that with smart ends instead of middle men at anything close to the same level of performance would probably cost several orders of magnitude more, and these sorts of economies of scale are a big factor driving consolidation and centralization.

  6. There are several potential choke-points
    – DNS providers can remove your name. Google is a huge provider.
    – Datacenter can refuse to host your server. There are lots of mom and pop shops but also Amazon, Google, Microsoft.
    – Facebook/Wordpress/Twitter etc can block or close you
    – Google search can drop you

    and these are “proactive” choke-points, in the sense that Google can tag you as hate-speech and lock you down. But there is also “reactive”, in which say a datacenter otherwise minding its own business can be targeted because it happens to be hosting a bakery site that does not want to make gay-themed wedding cakes.

    Unfortunately there is a trend for our private lives/opinions to become public. I think of that fella who ran Firefox and was fired for giving to a traditional-marriage cause. But in our panopticon near-future, how long before your cell phone gets tracked into a Chik-Fil-A (or adult store or gun show, whatever) and someone who can affect you negatively finds out about it.

    Our parent’s generation think along the lines “I have nothing to hide”. I have something to hide – I’m a conservative, and I don’t want my kid’s school, the IRS, or Google to find out.

  7. Another case of registrar pressure worth looking at.

    The question is a fundamental one regarding power: who gets to make rules, police standards, and impose punitive consequences for non-compliance? And anyone that can impose consequences is inevitably brought under the pressure of another entity which can punish them for not using that power by cooperating with the stronger entity’s goals, until one reaches a sovereign state.

    The only way around this is via blockchain (e.g., Namecoin) or some similar technology which makes such policing all but impossible, so everyone can credibly and accurately say, “sorry, there’s nothing we can do.”

    Of course the big broadband ISPs can still try to throttle, block, or filter any “uncensorable” services, and even if they didn’t, any service that cannot be policed will inevitably attract everyone trying to avoid the attention of the police, not for political thoughtcrime, but for actual crimes.

Comments are closed.