Posted: 17th Jan, 2009 By: MarkJ
UK ISP Demon Internet (THUS, C&W) reports that an error in the implementation of the
Internet Watch Foundations (IWF) child porn block list has been resolved. The issue caused access to sites stored on the popular
Internet Archive Wayback Machine (IAWM) website to be hindered (
original news) by incorrect URLs (website addresses) that propagated the sites database of links.
It's now known that the IWF does indeed have a block in place upon some of the content stored by the IAWM but that this was never meant to inhibit viewing of the whole site. One of Demon's technical bods, Brian, explains that the situation occurred because of a problem with how its IWF filter interacted but the IAWMs cache (fast memory) servers:
"
The filter we use uses a proxy to inspect suspect URLs. Where a URL is not on the IWF list (ie, the server hosts some child abuse content, but only a single URL is blocked), we have to proxy the connection on to the original server the request was intended for.
Here's where it gets interesting. The proxy sends various bits of information with the request. One of these is the name of the proxy itself. Not unsurprisingly, this is 'iwfwebfilter.thus.net'.
It seems that archive.org use caches at their end to speed up access to pages. When a page is requested, if it's not in the cache, it is built from the archive and made available to the requestor. As part of this build process, the server takes a hostname from the cache, along with the date portion of the URL, etc, to create the 'base URL' of the page.
Unfortunately, the archive.org software would take the server name we supplied and use it in place of 'web.archive.org', which is why you'd get [AN INCORRECT URL]," said Brian on
Demon's support group.
Brian continues to explain that the problem would only occur in some situations, such as when the page wasn't already stored in cache. The long and short of this appears to be that it was actually the IAWM's fault, which has now fixed the issue on its site.