There is a load of noise being generated right now about Wikipedia being filtered by six of Britain’s largest ISPs because of instances of child pornography being found on certain articles. In a rather opportune coincidence, I came across the effect of this filtering mechanism a little over a week ago because of problems accessing some Ning networks that I am members of/operate with – importantly none of these are child pornography sites but are engineering and software coding related ones including the Ning Developer and Network Creator networks. I am not an apologist for those despicable scum, just someone who managed to get a written explanation of the transparent caching mechanism that is used by one of these ISPs – Plusnet. I thought it would be useful to post the description in full for those like Cory Doctorow, so they could understand the mechanism better rather than have to reverse engineer it.
My problems stated when I received posting timeouts on trying to sign up to the Ning Developer network, which produced the interesting error message of:
(104) Connection reset by peer
An error condition occurred while reading data from the network. Please retry your request.
Generated Tue, 25 Nov 2008 11:08:54 GMT by pcl-iwfcache02.plus.net (squid)
and in the end this message was being generated by problems in the Ning network configurations, which they resolved but it acted as the canary for revealing the transparent caching solution which was operating. Why it was operating was not particularly clear. After the error went away with Ning, I thought nothing more until a Tweet response came back from someone else on Plusnet who operated Nings, to which I replied with a comment about Plusnet and transparent caching. Now Plusnet, like other ISPs, are now on Twitter and there was an immediate response and a reference to a new update on my open ticket to Plusnet about my problems. The response was:
Dear Mr Nock,
Apologies for the confusion but all of your traffic does not pass through these proxies. Their purpose is to prevent access to websites that are listed on the Internet Watch Foundations child abuse lists – http://www.iwf.org.uk/
Our routers firirstly check the IP address of the server that’s hosting the URL you’re trying to access. If they determine that the IP address is also used to host one of the websites on the IWF list then your request is passed to the IWF proxies.
A lookup is then done and if the address you’re trying to access matches one on the list then the request is denied. If it doesn’t match, then the request should be honoured and you shouldn’t experience any problems.
Here’s a visual depiction – http://community.plus.net/wp-content/uploads/2008/12/iwf_process_flow.png
This isn’t foolproof though and we have seen issues akin to what you’re experiencing with other shared sights like Yahoo profiles and Flickr. The URL you’re trying to access will be perfectly safe (and won’t be on the IWF list) but the process is breaking down somewhere.
I’ll do some playing around with ning on a test line and assuming I can replicate the problem I’ll raise it internally so we can fix it for you.
For those interested, here is the visual depiction referenced above:
I was initially concerned that all my content was passing via this transparent cache – a privacy and performance concern – but as described above this does not happen unless I hit the criteria set by the ‘suspect list’. However it seems that the suspect list is a little suspect, particularly for shared sites like Ning, Flickr, and now as we see Wikipedia.