The Great Firewall of China (c) Ryan McLaughlinFinally. Finally a journalist has sat down and hashed out a detailed, sourced explanation of how China’s Internet censorship works. May I just say, thank you Mr. Fallows.

James Fallows is The Atlantic’s “man in China”, and his article “The Connection Has Been Reset” is the first article I’ve seen that has given an extensive rundown of the technologies and policies the Chinese government employs in its battle with keeping the Internet a sanitary and “harmonized” place.

Fallows explains the censorship breaks down into four levels: 1. the DNS block, 2. the “Connect” Phase, 3. the URL keyword blog, and 4. content scanning.

1. The DNS Block

The first and bluntest is the “DNS block.” The DNS, or Domain Name System, is in effect the telephone directory of Internet sites. Each time you enter a Web address, or URL—www.yahoo.com, let’s say—the DNS looks up the IP address where the site can be found. IP addresses are numbers separated by dots—for example, TheAtlantic.com’s is 38.118.42.200. If the DNS is instructed to give back no address, or a bad address, the user can’t reach the site in question—as a phone user could not make a call if given a bad number. Typing in the URL for the BBC’s main news site often gets the no-address treatment: if you try news.bbc.co.uk, you may get a “Site not found” message on the screen. For two months in 2002, Google’s Chinese site, Google.cn, got a different kind of bad-address treatment, which shunted users to its main competitor, the dominant Chinese search engine, Baidu. Chinese academics complained that this was hampering their work. The government, which does not have to stand for reelection but still tries not to antagonize important groups needlessly, let Google.cn back online. During politically sensitive times, like last fall’s 17th Communist Party Congress, many foreign sites have been temporarily shut down this way.

2. The “Connect” Phase

Next is the perilous “connect” phase. If the DNS has looked up and provided the right IP address, your computer sends a signal requesting a connection with that remote site. While your signal is going out, and as the other system is sending a reply, the surveillance computers within China are looking over your request, which has been mirrored to them. They quickly check a list of forbidden IP sites. If you’re trying to reach one on that blacklist, the Chinese international-gateway servers will interrupt the transmission by sending an Internet “Reset” command both to your computer and to the one you’re trying to reach. Reset is a perfectly routine Internet function, which is used to repair connections that have become unsynchronized. But in this case it’s equivalent to forcing the phones on each end of a conversation to hang up. Instead of the site you want, you usually see an onscreen message beginning “The connection has been reset”; sometimes instead you get “Site not found.” Annoyingly, blogs hosted by the popular system Blogspot are on this IP blacklist. For a typical Google-type search, many of the links shown on the results page are from Wikipedia or one of these main blog sites. You will see these links when you search from inside China, but if you click on them, you won’t get what you want.

3. The URL Keyword Block

The third barrier comes with what Lih calls “URL keyword block.” The numerical Internet address you are trying to reach might not be on the blacklist. But if the words in its URL include forbidden terms, the connection will also be reset. (The Uniform Resource Locator is a site’s address in plain English—say, www.microsoft.com—rather than its all-numeric IP address.) The site [*FLG*].com appears to have no active content, but even if it did, Internet users in China would not be able to see it. The forbidden list contains words in English, Chinese, and other languages, and is frequently revised—“like, with the name of the latest town with a coal mine disaster,” as Lih put it. Here the GFW’s programming technique is not a reset command but a “black-hole loop,” in which a request for a page is trapped in a sequence of delaying commands. These are the programming equivalent of the old saw about how to keep an idiot busy: you take a piece of paper and write “Please turn over” on each side. When the Firefox browser detects that it is in this kind of loop, it gives an error message saying: “The server is redirecting the request for this address in a way that will never complete.”

4. Content Scanning

The final step involves the newest and most sophisticated part of the GFW: scanning the actual contents of each page—which stories The New York Times is featuring, what a China-related blog carries in its latest update—to judge its page-by-page acceptability. This again is done with mirrors. When you reach a favorite blog or news site and ask to see particular items, the requested pages come to you—and to the surveillance system at the same time. The GFW scanner checks the content of each item against its list of forbidden terms. If it finds something it doesn’t like, it breaks the connection to the offending site and won’t let you download anything further from it. The GFW then imposes a temporary blackout on further “IP1 to IP2” attempts—that is, efforts to establish communications between the user and the offending site. Usually the first time-out is for two minutes. If the user tries to reach the site during that time, a five-minute time-out might begin. On a third try, the time-out might be 30 minutes or an hour—and so on through an escalating sequence of punishments.

He goes on to warn that folks who continually search for items that hit one block or another may eventually attract the attention of the authorities and be flagged as a person that needs closer examination (financial ruin, jail time, organs on eBay… whatever).

According to The Atlantic’s follow-up article, “Penetrating the Great Firewall“, Fallows was able to put together such a detailed report on the GFW (or as Danwei loves to use, “Net Nanny”) through his long standing connections with folks in the tech industries on both sides of the globe. Under the condition of anonymity, many high-level technicians agreed to walk him through what happens.

Again, go read the entire article, it’s an absolutely fantastic read that tackles a topic much larger than just Internet censorship – that of using the perception of non-censorship to control not the amount of information a population has, but rather just what that information is. A starved person with a recently filled belly need never know they just ate a plate of shite.

H/T to Matt Schiavenza for pointing me to the article.

More

Discussion

14
  1. Hey Ryan

    I have to agree Fallow’s article was really useful. Now if he could only figure out a way to download CBC podcasts when their website is blocked, I’ll be happy.

    J.

  2. I think the next step in journalism is to uncover the way keywords and blacklists are compiled. For example, the humor website http://hillaryismomjeans.com was working fine for me two days ago but was suddenly banned yesterday. What set off the censor flags? (If you proxy to the website you may understand my puzzlement.)

    Similarly, a journalist could uncover how censorship rules are propagated to ISPs around China. For instance, my friends and I noticed a lag time in censorship from north-south and south-north. Will regions ever ignore a centralized ban? Which regions are more strict in banning?

  3. @JohnG: I know this was discussed in previous posts about the NHL broadcasts, but are proxies completely ineffective?

    @Matthew: Yeah, would be great. There is a list on Wikipedia, but you’ll have to go through a solid proxy to get there – I got shut down twice while trying to view it just now (both using The Free Dictionary‘s Wikipedia function and the Gollum Browser).

    Re: the lag – I’ve noticed that too.

    As for why you can’t get on a silly site like the one you mentioned, it’s possible that it’s hosted on a shared server (hosting where a huge number of Web sites share a single IP), and another site in the bunch has been blocked. But I’m guessing.

  4. Ryan I don’t know about the IP range ban in that site’s case. What’s odd is that it suddenly got blocked after it became an Internet meme, suggesting that the Great Firewall may have a “defense mechanism” that blocks sites suddenly getting a lot of traffic. Another possibility is that someone started writing messages on the site that included banned keywords.

  5. The banned keywords thing would kill it after it loads (2nd load), so that seems unlikely. Though a quick check of the other domains on that IP and I can’t see anything offensive – meh.

    China’s a safer place because of it. 😉

  6. The second linked article was a good read, and reminded me of Dalian last year when during the Summer Davos conference Wikipedia was unblocked.

    For point #1, I have noticed that CNC have recently taken to hi-jacking 404 Page Not Found, re-directing to a CNC page, which is interesting as a site that is blocked doesn’t get redirected to this page, making blocked sites more conspicuous.

  7. Pingback: Bone:As you know » Bone’s View (010):公众政治参与、政治和解

  8. Great Post

    It’s good to get a better idea of how things are filtered.

    @John tried a VPN? witopia.net is a steal at 40USD a year. I also am led to believe they have servers in HK which apparently helps speedwise.

  9. Pingback: Troubles in China’s Wild West | Lost Laowai China Blog

  10. Pingback: China plans a 'Cloud Computing' zone, free from Great Firewall | Penn Olson

  11. I really loved the article about how the GFW works. I have not tried to type certain words or phrases that may get me and wife in trouble but i have notice also that popular sites that have a lot of traffic will be banned. I think this is what I miss most of the USA is the internet freedom that many take for granted.

  12. Pingback: Giving Thanks to the Chinese Great Firewall | Startup Marketing | Making Lemonade by Jacqui Chew

  13. Pingback: Giving thanks to the Great Firewall – jacquichew

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Return to Top ▲Return to Top ▲