Home
 » ISP News » 
Sponsored Links

BT Reveal Cause of 999 Emergency UK Call Handling Outage

Friday, Jun 30th, 2023 (4:20 pm) - Score 12,248
telephone uk red ringing broadband

Broadband and telecoms giant BT has revealed that last Sunday’s outage and disruption to their 999 emergency services number, which resulted in 11,470 unique emergency calls being “unsuccessfully connected“, was caused by a “complex software issue” that had never previously been seen through their testing regime.

Apparently, the software bug was causing a “caching issue“, which resulted in impacted calls not being routed correctly and the user’s call being disconnected (possibly due to the call timing being out of sync). A “robust temporary fix” was put in place to rectify this last weekend, after several hours of chaos, and they’re currently testing the permanent fix.

BT noted that, for each of the missed calls, they commit to call the customer back and establish whether further help is needed and, if required, connect to the appropriate emergency service. If BT’s contact is unsuccessful, they pass the detail onto the police to investigate. Following the incident, this process was finally completed by 08:16 on Wednesday, 28th June, which is several days after the incident itself.

Advertisement

At this stage, it’s not known how many actual lives may have been harmed, or individuals and property impacted, by health or crime related issues as a result of the disruption. “We are putting in place significant improvements to our systems and processes, and we will fully cooperate with Ofcom’s investigation,” added BT. The operator has also released a long summary of the day’s events, which we’ve published in full below.

BT’s Overview of events (999 Outage)

At 06:24 on Sunday 25 June, our 999 call handling agents started to experience problems with some emergency calls being cut off on connection to the emergency services. Our network management team was informed at 06:40 and an internal incident was raised at 07:02.

After initial investigations failed to identify or ameliorate the problem, a conference call was opened at 07:20 with our technical specialist teams to further investigate.

The teams were aware of a growing number of 999 calls being impacted and the root cause of the issue was unknown.

To ensure a triple resilient network, we run the service on three primary network clusters, each of which has the capacity to handle all 999 call traffic. It was unclear which network cluster was affected because no alarms were presented. The decision was therefore made to switch from the three primary network clusters to the 999 backup system at 07:25.

Transfer to the backup system was attempted at 07:31. At 07:46 it became clear that this had been unsuccessful. While the backup system itself was ready to handle calls, the complex transfer process had not been completed successfully. We have since put in place steps to simplify this process.

In our ongoing attempts to restore full service we returned one of the primary network clusters back into operation. However, as became apparent later on, the network cluster that had been selected to attempt service restoration is where the fault lay. This resulted in callers being unable to connect to the 999 service between 07:32 and 08:50.

The incident priority was increased at 07:47 and after the relevant teams had been briefed, at 08:01 the Lead 999 Centre notified all Emergency Authorities of the situation simultaneously via email.

The incident was designated to SI (serious incident) status at 08:20 which, in-line with our internal processes, meant that our Civil Resilience team became notified (at 08:44).

Transfer to the backup system was successfully initiated at 08:37 (for calls from landline) and 08:50 (for calls from a mobile). This significantly improved 999 call answer success, albeit with only basic service functionality which meant increased call pick-up and call handling time.

Ofcom was alerted via a call at 09:05. The first of our external-facing media statements was issued at 09:35 confirming the issue and making clear that our backup system was online and that people should call 999 as usual. At 09:45 an email notification was sent to the Department for Science Innovation & Technology (DSIT), Ofcom and Devolved Administrations.

With the backup system operating successfully, the teams’ primary focus switched to root cause investigations to enable return to the primary 999 call system. At 11:54, following further diagnosis, we started to reintroduce non-emergency traffic to the non-impacted primary network clusters, while continuing to isolate the impacted cluster.

After extended monitoring we began moving emergency calls onto the primary network clusters from 14:52. By 16:56 all emergency calls were being handled by the primary 999 system.

In parallel, diagnostics and a temporary fix on the impacted network cluster meant it was re-introduced at 20:50, firstly for non-emergency traffic. After no issues were experienced, emergency calls were re-introduced at 21:29.

At 22:14, following approvals from Government, we issued the second of our media statements confirming that the service was restored, and we were no longer relying on the backup system.

Share with Twitter
Share with Linkedin
Share with Facebook
Share with Reddit
Share with Pinterest
Mark-Jackson
By Mark Jackson
Mark is a professional technology writer, IT consultant and computer engineer from Dorset (England), he also founded ISPreview in 1999 and enjoys analysing the latest telecoms and broadband developments. Find me on X (Twitter), Mastodon, Facebook and .
Search ISP News
Search ISP Listings
Search ISP Reviews
Comments
25 Responses

Advertisement

  1. Avatar photo Ad47uk says:

    this is the problem these days with having software control everything. This is the problem with relying on computers and apps.
    I hope that people did not die or become too ill because of this.

    1. Avatar photo Anon says:

      Since hardware and humans fail, make mistakes, and often don’t scale well, what do you propose doing, Ad47uk?

    2. Avatar photo Ivor says:

      “these days”?

      telephone switching has been software based since the 80s (and exclusively by the late 90s once BT completed the digital switch rollout), especially at the sort of centralised level at which 999 call handling would have occurred

      they’ve had a pretty good record so far, though “pretty good” isn’t good enough for something like this.

    3. Avatar photo 4chAnon says:

      Proven right that Born in 1847 is indeed your true name

    4. Avatar photo Ad47uk says:

      @Ivor, I realise that, but stuff seems to be less reliable these days and it seems to be getting worse, so what are they doing?

      i also realise that software is in most things we buy, including Tv and even washing machines. But the more complex they are, the more they seem to go wrong.
      My old washing machine lasted for years, no software in that, just a plain old mechanical timer, I got rid of it because it was starting to leak and would cost more than it was worth to fix, so I got myself a cheap Beko, it works fine even if it does jump around a bit on spin :). I know people with expensive machines that are supposed to have computerised this and that and yet they have more problems with them than i had with my old machine and this simple Beko.

      Sometimes the simple ways are still the best.

    5. Avatar photo Chris W says:

      How many lives have been saved by improvements technology though? Being able to immediately find and pass on the location of a 999 call, receive location data from mobile devices and handle the ever-rising call volumes (35 million a year at last count, up from 25 million back in 2000) would never have been achievable without the modern systems being used.

    6. Avatar photo Buggerlugz says:

      Still not excusable though. The system had to be 100% full proof to be fit for purpose. It obviously wasn’t tested well enough.

  2. Avatar photo okaf says:

    As opposed to what.. an old fashioned switch board with operators plugging jacks in to the right place to connect people? Yeah right lol

    1. Avatar photo Ad47uk says:

      Yeah, good idea 🙂

      Should have stayed with mechanical means, i remember hearing the exchange when it used to be by our cathedral, walk past the building ad you could her the switches as someone dialled. Never had the problems they do now, the only problem i remember having when i lived with my parents was when water got into the pit.
      I expect other people had more problems, but things seemed to be more reliable then.

      No self scan that say unrecognised item in the baggage area. Cash, you know where you are with cash, no silly spy cards, loyalty was done by stamps. But then everything is about big data these days.

    2. Avatar photo 125us says:

      You think that the current system which has failed once in the 33 years since the trunk network was digitalised should be retired and replaced with a mechanical one? Abandon the system that puts calling number and exact location on the operator’s screen before they’ve even answered the call with one they required an engineer to be called out, drive to the exchange, do a painstaking manual trace of the call, look the number up in paper records and then get that information to the emergency services, hoping that the caller hasn’t died in the intervening 45 minutes? Wow.

    3. Avatar photo Mel says:

      Ah yes good old mechanical means…

      I remember making three failed attempts to dial 999 using the good old fashioned mechanical dial on a trimphone in the room directly below the bedroom that was well on fire at the time, after waking up to smoke haze in my own bedroom and getting everyone out. I was still quite panicked at the time, and worried that the ceiling might come down and probably didn’t allow the dial to spin back all the way which seemed to take an interminable age, would have given anything for a digital push button phone at the time. Gave up and got a neighbour to call in the end.

      I dread to think how many extra staff it would require to deal with 999 calls and maintain a mechanical system and how many calls would get lost every day and how much longer it would take to deal with each call.

      Bugs at least can be fixed. And as far as microcontroller based washing machines go, we had our first in 1981, and are currently on our third, only real disadvantages of them are they usually refuse to carry on when they detect a fault, although they often tell you exactly what needs fixing, and new control boards usually come unprogrammed these days, so if you can’t fix the board you’d have to pay for another copy of software.

    4. Avatar photo REGIS says:

      @125us “You think that the current system which has failed once in the 33 years since the trunk network was digitalised should be retired and replaced with a mechanical one?”

      They did it with concord…… one crash in 27 years and the whole fleet was mothballed .

      No offence meant but if one of your loved ones had been affected by not being able to get help and suffered life changing injury or even death then you would be singing another tune.

      My biggest problem with the whole thing is not that the system failed but the amount of time it took to first notice the problem (why was no alarms raised by the system that it wasn’t working as intended, instead 16mins of the call operators saying there was a problem before anyone took notice then another 22 mins before anyone looked into it) but also the BACKUP failed to work properly. they going to blame china because there’s one cable with a huawei sticker on it?

      Where i work we are at risk of flooding and the anti flood defences are tested regularly every month, id like to know when was the last test of this “backup” was done.

    5. Avatar photo Just a thought says:

      Some progress is good, some is bad.

      Muses:
      How do you connect a mobile phone call with a plugboard?…. Good job we still have our landline:-) Oh wait, they’re ditching that
      …..

      No system can be 100% reliable. The operator on your local plug board could have a heart attack and fail to connect you. (How many simultaneous back up operators do you employ?) the wire could come lose on your exchange plug (or a joint could fail on a mechanical Stroweger exchange)

  3. Avatar photo DL says:

    “There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.”

    1. Avatar photo Morris Oxford says:

      and memory overwrites.

  4. Avatar photo Buggerlugz says:

    Oh to have actual hardware connecting calls eh……….then again knowing our government as well as we do they’d have probably bought some Chinesium solution.

    1. Avatar photo REGIS says:

      The system that according to google has a uptime of 99.45% since it was launched years ago AND not only has one all purpose call number 911 BUT fire, ambulance, coast guard and police also have their own dedicated emergency call numbers…….

      Id have that system in a heartbeat!

      your talking about a country that publicly apologizes if a train is so much as 2 minutes late, when my mum fell we were told an ambulance would not be available for at least 3 hours.

    2. Avatar photo Buggerlugz says:

      Its still not good enough. It needs to be 100% reliable. As does the services it puts into action.

    3. Avatar photo Chris W says:

      There is no such thing as 100% reliable.

  5. Avatar photo Steve says:

    I have read all of the arguments above for and against the current system. I believe the vulnerability in the system isn’t in new switching technology but in centralisation of the call handling system. At one time the 999 service would be located in the operator centre (100,191,192,151\2 and 999 services) and typically housed in a provincial town’s main telephone exchange serving a hinterland of typically up to 30 miles radius. That centre would handle only calls for their immediate area and pass them on to the local emergency services. Such a system could still exist using digital telephone exchanges and if a failure occurred it would affect that area only and not the whole country. It is too tempting when distance is no object to put all your eggs in one basket.

    1. Avatar photo Buggerlugz says:

      Don’t give them idea’s Steve, they’ll probably offload it to an Indian Scam call centre to handle the calls.

    2. Avatar photo The Facts says:

      How many centres would you have?

    3. Avatar photo Steve says:

      There was no suggestion of offshoring the service, in fact the opposite. The security of the service would be secured if it was kept very local.
      As for the second reply, I would suggest one per county or slightly less depending on the density of population.

  6. Avatar photo Angie says:

    I tried calling for an ambulance for 40 minutes on the morning of 25.06.23. The calls were answered then I was placed in a queue and after 3 minutes every call was disconnected. Sadly for us my husband who was 49 passed away. The service obviously isn’t fit for purpose if this can happen. I was just 1 of over 11,000 people pleading for help that day.. the paramedics were amazing but unfortunately my call wasn’t answered in a timely manner.

Comments are closed

Cheap BIG ISPs for 100Mbps+
Community Fibre UK ISP Logo
150Mbps
Gift: None
Vodafone UK ISP Logo
Vodafone £24.00 - 26.00
150Mbps
Gift: None
NOW UK ISP Logo
NOW £24.00
100Mbps
Gift: None
Virgin Media UK ISP Logo
Virgin Media £25.00
132Mbps
Gift: None
Sky UK ISP Logo
Sky £26.00
145Mbps
Gift: None
Large Availability | View All
Cheapest ISPs for 100Mbps+
Gigaclear UK ISP Logo
Gigaclear £17.00
200Mbps
Gift: None
Brsk UK ISP Logo
Brsk £19.00
150Mbps
Gift: None
Community Fibre UK ISP Logo
150Mbps
Gift: None
Hey! Broadband UK ISP Logo
150Mbps
Gift: None
YouFibre UK ISP Logo
YouFibre £23.99
150Mbps
Gift: None
Large Availability | View All
The Top 15 Category Tags
  1. FTTP (5986)
  2. BT (3630)
  3. Politics (2700)
  4. Business (2422)
  5. Openreach (2400)
  6. Building Digital UK (2323)
  7. Mobile Broadband (2124)
  8. FTTC (2080)
  9. Statistics (1890)
  10. 4G (1794)
  11. Virgin Media (1745)
  12. Ofcom Regulation (1568)
  13. Fibre Optic (1462)
  14. Wireless Internet (1456)
  15. FTTH (1385)
Promotion
Sponsored

Copyright © 1999 to Present - ISPreview.co.uk - All Rights Reserved - Terms , Privacy and Cookie Policy , Links , Website Rules , Contact
Mastodon