Weekly internet health check, US and worldwide

ThousandEyes, which tracks internet and cloud traffic, provides Network World with weekly updates on the performance of three categories of service provider: ISP, cloud provider, UCaaS.

Weekly internet health check, US and worldwide

The reliability of services delivered by ISPs, cloud providers and conferencing services (a.k.a. unified communications-as-a-service (UCaaS)) is an indication of how well served businesses are via the internet.

ThousandEyes is monitoring how these providers are handling the performance challenges they face. It will provide Network World a roundup of interesting events of the week in the delivery of these services, and Network World will provide a summary here. Stop back next week for another update, and see more details here.

Updated June 21

Global outages across all three categories last week increased from 332 to 427, up 29% from the week before. In the US, total outages jumped from 173 to 275, a 59% increase.

Worldwide the number of ISP outages  increased from 263 to 352, a 34%. In the US they increased from 146 to 250, a 71% increase.

Cloud provider network outages globally more than doubled for the second week in a row, from 10 to 23. In the US, cloud-provider network outages decreased from five to four.

Globally, collaboration-app network outages decreased from seven to six outages, while in the US they increased from three to four.

There were two significant outages during the week.

About 12:20 a.m. EDT on June 17, Akamai’s DDoS mitigation service, Prolexic Routed, experienced a service disruption that made its customers’ websites, including major financial services firms and airlines, unreachable. The outage affected many of the approximately 500 Akamai Prolexic customers that use the service. During the incident, there appeared to be a massive surge in network outages that also coincided with application availability issues. Akamai identified the cause as the Prolexic routing process. The outage was most severe in its initial minutes, but lasted until about 4:22 a.m. EDT.

About 2:40 p.m. EDT on June 15 , Cogent Communications experienced an outage that affecting downstream providers as well as Cogent customers in the US. The outage lasted around 35 minutes divided into three occurrences over the period of an hour. The first occurrence appeared centered on Cogent nodes in Chicago, Illinois, and Atlanta, Georgia. The outage appeared to clear around 2:45 p.m. EDT but reappeared five minutes later. Fifteen minutes into the outage the nodes in Atlanta, GA appeared to recover, leaving only the Cogent nodes located in Chicago exhibiting outage conditions. This continued for another four minutes before appearing to clear. The third occurrence of the outage was observed around 3:10 p.m. EDT centered at Cogent Chicago nodes. This third occurrence was the longest of the three, lasting around 24 minutes. The outage was cleared around 3:35 p.m. EDT.

Updated June 14

Total outages across all three categories last week jumped from 222 to 332, a 50% increase compared to the week prior. In the US, outages more than doubled from 80 to 173, a 116% increase compared to the week prior.

The number of ISP outages worldwide went from 182 to 263, a 45% increase. In the US they increased from 71 to 146, a 106% increase.

Cloud-provider network outages globally more than doubled from four to 10 outages, and in the US the grew from zero to five.

Globally, collaboration-app network outages jumped from one to seven, and in the US, from zero to three.

There were two notable outages during the week. Around 5:50 a.m. EDT on June 8, Fastly suffered a major outage that impacted the sites and applications of many of its customers. The outage, lasting about an hour, caused users to have issues loading content and accessing sites around the globe. Not all customers were affected for the full hour because they were able to use alternative services to deliver content to users. Around 6:27 a.m. EDT, Fastly announced it had identified the source of the outage, and around 6:50 a.m. announced that all services had been restored and the outage was cleared. An in depth view of the outage can be found here.

About 1:10 a.m. EDT on June 9, Zayo Group experienced an outage that affected some of its partners and customers in countries including the US, Germany, the Netherlands, Canada, the UK, Austria, Hong Kong, Australia, Brazil, Japan, Russia and Malaysia. The outage lasted around 54 minutes and appeared to center on Zayo nodes in Denver, Colorado, and Salt Lake City, Utah. Five minutes later, the Salt Lake City nodes appeared to recover but outage conditions started in nodes in Seattle, Washington, and London, UK. Thirty minutes into the outage it grew to include nodes in Chicago, Illinois, before being cleared around 2:10 a.m. EDT.

Updated June 7

Global outages across all three categories last week decreased from 265 to 222, a 16% decrease. In the US they dropped from 128 to 80, a 38% decrease.

ISP outages globally last week decreased from 211 to 182, a 14% decrease, while in the US they decreased from 105 to 71, a 32% decrease.

Globally, cloud provider network outages decreased from 9 to 4 and from two to zero in the US.

Collaboration-app network outages worldwide dropped from six to one and in the US dropped from five to none.

About 3 a.m. EDT on June 2, the ISP PCCW, experienced a 19-minute outage impacting some of its customers and networks in the US. It appeared to center on PCCW infrastructure located in Ashburn, Virginia, and was cleared around 3:25 a.m. EDT.

Around 6:45 p.m. on June 1, Microsoft experienced a 29-minute outage that impacted some downstream partners and access to services running on Microsoft environments. It appeared to be centered on Microsoft nodes located in Dublin, Ireland and was cleared around 7:15 p.m. EDT. Given the duration and timing relative to the location of the nodes at the cente of the outage, it is likely to have been a maintenance exercise.

Around 1:05 a.m. EDT on June 1, Flag Telecom Global Internet experienced an outage on their network that lasted around an hour and 51 minutes over a three-hour period. It affected customers and downstream partners in countries including the US, Australia, India, France, the Netherlands, Singapore, the Philippines, Hong Kong, Germany, Brazil, and Taiwan. It appeared to be centered on Flag Telecom nodes located in Singapore. Five minutes after the initial outage, Flag Telecom nodes in Hong Kong also exhibited outage conditions and coincided with an increase in the number of impacted countries, customers, and partners. After a further five minutes, the nodes located in Hong Kong appeared to recover for 10 minutes before exhibiting outage conditions again for five more minutes. Flag Telecom nodes located in Singapore also appeared to recover about 50 minutes after the initial outage. Around 2 a.m. EDT, the nodes located in Singapore again begin exhibiting outage conditions. A series of varying-duration outages, all centered on Singapore nodes, were observed for the next two hours. The outage was cleared around 4:05 a.m. EDT.

Update May 31

Global outages across all three categories last week decreased from 363 to 265, a 27% drop, while in the US they decreased from 197 to 128, a 35% decline.

ISP outages globally decreased from 284 to 211, down 26%. In the US they decreased from 175 to 105, a 40% drop.

Worldwide cloud-provider network outages decreased from 12 to 9 outages, and remained the same in the US with two.

Globally, collaboration-app network outages increased from five to six, and in the US they increased by two, from three to five.

There were two major outages during the week. At 12:15 a.m. EDT on May 26, Verizon Business experienced an outage affecting customers and partners across countries including the US, Ireland, Poland, the Netherlands, Canada, the UK, Germany, and India. The outage appeared to be centered on Verizon Business nodes in New York, New York, and was divided into two occurrences spanning 45 minutes. The first lasted around nine minutes and initially appeared to be clearing, with the number of affected parties dropping, but about 20 minutes later the outage returned and lasted about 23 minutes, again centered on nodes in New York. 

Around 1:35 p.m. EDT May 26, Cogent Communications experienced a series of outages totalling 48 minutes over the span of an hour and 10 minutes that impacted downstream providers and customers globally. The initial outage centered on Cogent nodes in Las Vegas, Nevada, and lasted around 12 minutes. Then the Cogent environment was stable for 10 minutes before experiencing a second occurrence on nodes in Dallas and Houston, Texas. Five minutes later, the Cogent node located in Dallas appeared to recover, but nodes in Kansas City, Missouri, experienced outages. After five more minutes the Kansas City nodes recovered, but nodes in Denver, Colorado, experienced outages.  Forty-five minutes after the initial outage was observed, a 24-minute outage was observed on nodes in Dallas, Houston, and Kansas City. Ten minutes into the third occurrence, the number of locations exhibiting outage conditions expanded to include Salt Lake City, Utah, Oklahoma City, Oklahoma, and Denver. As the Cogent nodes involved increased, so did the number of customer networks that were affected. The outage was cleared around 2:45 p.m. EDT.

Update May 24

Global outages across all three categories jumped from 252 to 363, last week, a 44% increase. In the U.S. outages increased from 123 to 197, a 60% increase.

Globally, ISP outages jumped from 180 to 284, up 58%, while in the US, ISP outages increased from 98 to 175, up 79%.

Worldwide, cloud-provider network outages declined slightly from 14 to 12 outages, but in the US, they increased from one to two. 

Collaboration-app network outages worldwide increased from three to five, and in the US increased from one to three.

There were three notable outages during the week.

Around 1:30 p.m. on May 20, Slack experienced an interruption to its business-communication platform that lasted about 25 minutes and affected users accessing the services. A number of internal server errors were observed. Slack identified the cause as a code change that inadvertently affected some workspaces. Slack reverted the change and restore services by 1:55 p.m. EDT.

About 8:55 a.m. EDT on May 19, Coinbase experienced an interruption that lasted about two hours and affected global access to the Coinbase site and application. Connectivity and access across the network appeared to be unimpaired during the interruption, with initial requests simply timing out with system errors indicating system congestion. An hour and half after the outage was first observed, services began to be restored, with access in APAC and EMEA still affected. The outage was cleared around 10:45 a.m. EDT.

On May 17, Hurricane Electric experienced an outage that was divided into three instances over an hour and a half that affected users across countries including the US, Australia, Singapore, Hong Kong, the UK, Brazil, Germany, South Africa, the Netherlands, and Canada. The first period was observed around 1:43 p.m. EDT centered on Hurricane Electric nodes in San Francisco and San Jose, California. After five minutes, the San Francisco nodes appeared to recover, reducing the scope of the outage. But five minutes after that, San Francisco nodes exhibited outage conditions again. Five minutes after this occurrence cleared, a second one lasting three minutes was observed centered on the San Jose and San Francisco nodes. Around 2:15 p.m. EDT, the nodes appeared to recover, temporarily clearing the outage, but an hour later, those two nodes exhibited outage conditions again before clearing after eight minutes. The total outage lasted around 26 minutes and was cleared around 3:25 p.m. EDT.

Update May 17

Global outages across all three categories last week increased from 237 to 252, up 6%, while in the US, they increased from 95 to 123, a 29% jump.

ISP outages globally increased from 168 to 180, and in the US they increased from 76 to 98.

Cloud-provider network outages worldwide increased from 13 to 14 but dropped from four to one in the US.

Globally, collaboration app network outages dropped from four to three, and in the US from two to one.

There were three significant outages this week.

About 3:30 p.m. on May 12, NetActuate experienced an outage affecting downstream partners and customers in the US. It lasted around 13 minutes overall, divided into two occurrences spanning a 30-minute period. The first lasted four minutes and appeared to center on NetActuate nodes located in Raleigh, North Carolina. The outage reappeared 15 minutes later and lasted nine minutes and centered on the Raleigh nodes and nodes in Durham, North Carolina, increasing the number of customers affected. The nodes in Durham cleared five minutes into the second period of the outage. The outage was cleared around 4:05 p.m. EDT.