Hopefully bad things don’t come in threes. Graeme Oxby, the CEO of London focused full fibre broadband ISP CommunityFibre, has apologised to customers after the operator suffered a second service outage yesterday, which much like the first one appears as if it could have been related to their Domain Name Servers (DNS). But this outage was much shorter.
In case anybody has forgotten, CommunityFibre was hit by a protracted outage on Monday (here), which lasted for several hours and impacted a sizeable portion of their customer base. But some savvy customers were able to work around it by using a third-party DNS provider (Quad9, Google Public DNS, OpenDNS, Cloudflare DNS etc.) to circumvent the ISPs own domain name system.
The DNS service typically works by converting Internet Protocol (IP) addresses into a human-readable form and back again (e.g. 123.56.32.122 becomes – examplezfakedomainlols.uk). Services like this tend to be provided automatically by your broadband and mobile provider, usually operating seamlessly in the background.
Advertisement
However, it’s not uncommon for ISPs to very occasionally suffer from problems with their DNS servers, which may arise due to a fault or misconfiguration in the system. When this happens, your physical broadband connection may still be live, but many of your requests to online domains will fail (other issues can also cause this, so we’re not 100% certain it was DNS).
Sadly, a similar outage struck CommunityFibre yesterday at around 3:45pm, although unlike the first outage they were able to resolve it within the space of about 30 minutes. Once again, some customers reported that they could work around it by changing DNS provider, but a few others found this didn’t work and opted to use a Virtual Private Network (VPN) instead. The situation prompted the provider’s CEO, Graeme Oxby, to issue a personal apology to his customers.
Graeme Oxby said:
“Unfortunately, I am having to apologise again as we had a network outage for 30 minutes this afternoon. Most customers should have had service restored immediately and we will keep working to get everyone restored.
If you’re still experiencing any issues getting online, please switch the power off to the fibre box for 2 minutes and turn it back on. Then, switch off the power to the router for 2 minutes and turn it back on. Your connection may take 1 to 3 minutes to re-establish.
Please do not attempt any other troubleshooting steps like adjusting any settings or taking out any cables as this may prevent service restoring automatically.
Thank you so much and sorry for any inconvenience.”
We have to give credit to the provider, and it’s CEO, here for taking responsibility and responding in a much more personal way than we’re used to seeing from ISPs in this market. Most providers tend to just fob customers off with a vague notice and then offer no follow up after the event (although it would have been even better to get some explanation for the cause).
However, brief outages like this are of course to be expected in the complexity of modern broadband networks, although in this case CommunityFibre appears as if it may have been dealing with a reoccurrence of the same or a similar / related issue to the one that struck on Monday. Hopefully there won’t be a third event anytime soon.
Advertisement
Advertisement
“brief outages like this are of course to be expected in the complexity of modern broadband networks” – disagree. Corner-cutting and poor engineering on networks is the predominant cause of such outages.
We build many, many systems of greater complexity (power stations, aircraft etc) with much lower failure rates. This is simply a question of whether providers are engineering reliability into their networks.
Yes, but you can’t promise 100% uptime on affordable residential services. That is not a realistic expectation, there are always events and failures, even in the best kit.
Ex-BB Sales: “You can have a £20 package, or you can have a £60 package. But do you know what you can’t have? £60 service that only costs £20.”
Also a question of scale and consequence, an ISP with 300,000 customers can devote less recourses to making the system fool proof than one with 3 million customers where the cost per customer is much lower. Likewise it’s annoying when it does go down but unlike an aircraft it’s not likely to result in people getting killed.
What a nonsense. How many DNS servers do they have, and how many times has their DNS failed in the last five years? Now, take your “many systems of greater complexity (power stations, aircraft, etc.)” and tell me how many of them have failed recently and how many people were affected (counting fatal injures), compared to the DNS glitch that smarter customers have circumvented. Of course DNS system can be improved, but please stop speaking on matters you have no knowledge of.
Presumably services like Quad9 avoid these sort of outages by charging more than CommunityFibre?
DNS is the one thing they should be able to run reliably. It is very easy to scale and to make highly redundant at a very low cost. It’s even easier for an ISP given they control the entire network – compared to operating a public DNS server that needs to work from multiple networks with no control over routing. But most ISPs seem to just chuck a couple of boxes up and call it done.
@Ben
Running DNS servers is a nightmare. A lot of things can affect reliability or cause them to become unavailable.
Just because they’re an ISP doesn’t mean they have engineers in house with full working knowledge of whatever stack they’re running, along with the capability to develop and implement patches to the if needed.
They will be using some third party vendor’s server software, there’s no way it will be fully homegrown. So maybe they are dependent on that vendor to fix this issue as per their support contract, maybe they thought they had fixed it only for the second outage to happen and may now be working with them again to investigate further.
There’s a reason many people outsource DNS if they can and have someone else run it. Much like email, it’s time consuming, complex and not worth the headache.
@John Respectfully I disagree. And I speak from experience.
Email is complex. It involves storing data (so needs backups) and is a stateful – both making it more difficult to scale out. I ran email many years ago, and believe it isn’t something ISPs should run when there are so many better options.
But DNS is fundamental to the network, and quite simple. A resolving DNS has no data to store other than a cache, which is disposable. The existing software is tried and proven over decades of use, with some newer software available specifically for ISPs.
And no ISP should be reliant on a single DNS server as it is so easy to scale. There is no need to synchronise record sets. It is mostly UDP, so you can easily distribute queries between servers at the packet level. Plus it opens up the use of anycast. It doesn’t take much to maintain a number of physically diverse locations, each with a number of physical servers.
Unfortunately it sounds like they stuck a couple of boxes in a rack and called it a day.
Lots of things can be scaled and made more reliable for not much cost, but when you apply that approach to every element of your network and service the cost quickly adds up – a little bit here and a little bit there and soon your cost per user is double that of your rivals.
And what do customers value above all else? The primary factor in choosing a supplier? Lowest price.
The market actively rewards ISPs who spend as little as possible on things.
If he’s really sorry then they should release a firmware to allow changing the upstream DNS and use a wider range of DNS servers by default.
Or disable the DHCP server in the CF router and use an external one. I have an old TP-Link wireless access point that has a built in DHCP Server that allows you to point devices at whichever DNS server you like.
This could of course have been prevented if CF were to set their routers to use Google or Cloudflare or OpenDNS, or even point their DHCP servers to use one of them.
I appreciate their content filtering service uses their own DNS servers but for the minority of customers who want content filtering, they could provide a guide on how to change router DNS server settings.
I wasn’t affected as I use Cloudflare DNS, but appreciate most people are clueless and just expect “the internet” to “work”. And rightly so.
CF seem to have reduced my latest bill by 50p so all in all I’ve alright out of the outages! #winning
I half thought the same on Monday, but then someone mentioned what Google do with some of the data they obtain from the DNS service they offer – sure people should be free to use it, but not based on their ISP’s configuration.
CF has generally been reliable for me and they’ve achieved something in my area of London that Openreach has completely failed to deliver FTTP in… and for very reasonable prices. Can’t complain at all.
I’m the opposite, I think it’s scummy when an ISP hands off customer DNS to OpenDNS, Google or Cloudflare by default. That’s handing over customer’s browsing data to a 3rd party without asking the customer first.
You can argue it’s minimal consequence but there’s a reason these companies run open resolvers for free! They gather data which is valuable at the right scale.
I agree with Tom,
I don’t mind using other DNS servers, in fact I do, my router is set to Open DNS, but it should not be the default.
hasn’t CloudFlare itself had massive outages in the past? Ironic considering they love to pontificate on other companies’ outages.
I agree with the other commenter. The internet is not supposed to be centralised in the way that it has become. It is entirely right and proper for ISPs to run their own essential services such as DNS servers, and they should do so in a competent and well engineered fashion. If a customer wants to use different DNS servers then that’s up to them.
Just remember that one day soon some people will rely on their internet connection to contact the emergency services.
That is the problem, well it will be very few people, that is not the point.
Only really a valid point if you believe the PSTN to be infallible. It isn’t.
The major provider’s VoIP services may use the same last mile connection as your Internet access, but they don’t use the public Internet. A DNS failure like this one (if that’s what it was) would not have any impact on telephony.
The outage did not effected my parents or other customers who use alternative dns I use ad blocker dns