Inside the Briefcase

Webcast: HOW TO SCALE A DATA LITERACY PROGRAM AT YOUR ORGANIZATION

Webcast: HOW TO SCALE A DATA LITERACY PROGRAM AT YOUR ORGANIZATION

Join data & analytics leaders from Starbucks, Cardinal Health,...

How EverQuote Democratized Data Through Self-Service Analytics

How EverQuote Democratized Data Through Self-Service Analytics

During our recent webinar on scaling self-service analytics, AtScale...

How Analytics Governance Empowers Self-Service BI

How Analytics Governance Empowers Self-Service BI

The benefit of implementing analytics policies at the semantic...

How To Create A Powerful SMS Marketing Strategy

How To Create A Powerful SMS Marketing Strategy

Many small businesses are looking for more ways to...

Emerging Frameworks & Technologies that Combat the Rising Threat of Cyber Attacks

Emerging Frameworks & Technologies that Combat the Rising Threat of Cyber Attacks

The creation of the first computer virus in 1971...

A Cautionary Tale: Why Global Routing Systems Must Implement MANRS

May 20, 2021 No Comments

Featured article by Leo Vasiliou, Director of Product Marketing, Catchpoint

The pandemic has made strong digital connections more critical than ever. Anything that negatively impacts the digital experience can rapidly become a major problem. It’s even more worrisome when the issue causes widespread outages that affect a multitude of providers.

In fact, this is exactly what happened at the end of last week. On April 16, a significant BGP leak caused widespread network outages that impacted major network operators, cloud, and CDN providers. This incident was a classic origin hijack case from Vodafone Idea (AS55410), an Indian operator based in Mumbai and Gandhinagar.

The Vodafone Idea ASN was inundated with traffic, 13 times higher than average, leaving its users unable to access the internet. Most likely, the issue was caused by a wrong advertisement made by one of their customers, as reported by Medianama.

At Catchpoint, we used our digital experience monitoring (DEM) solution to analyze the network and BGP data during the incident. This enabled us to understand what caused the issue, how the incident progressed, and the impact it had on end-user experience. We also came to some conclusions as to how such an incident can be avoided in the future.

Let’s break it down.

What Went Wrong And Where? 

According to CAIDA ASRank, Vodafone Idea (AS55410) is an Autonomous System with five different providers:

* CenturyLink (AS3549-AS3356)
* PCCW Global (AS3491)
* CW Vodafone (AS1273)
* Bharti Airtel (AS9498)
* Tata Communications (AS4755)

Right before the incident, this AS was announcing 823 IPv4 subnets and 8 IPv6 subnets.

We analyzed RIS rrc00 route collector, the most populated deployed by RIPE NCC and found that on April 16, 2021, 1:48:58 PM GMT, AS55410 went into a frenzy, announcing on the Internet 34000+ networks not belonging to AS55410. These networks were already announced on the Internet by their legitimate owners. As a result, a large portion of user traffic got redirected to AS55410 instead of the proper destination. The result: disruption of services for 3500 companies around the world.

Organizations impacted included the majority of the national telcos (Deutsche Telekom, TIM, Claro, Orange, Telefonica), CDNs (Google, Akamai, Edgecast), and even banks (Punjab National Bank). A number of Vodaphone services were also impacted by the hijack.

Fig 1 300x175 A Cautionary Tale: Why Global Routing Systems Must Implement MANRS

Figure 1

RIS rrc00 collector recorded ~225k BGP packets with AS55410 as the originating AS in the AS path between 1:45PM GMT and 3:00PM GMT. Most were announcements of hijacked networks. Out of 73 peers sharing data with rrc00, 60 of which were sharing a full route, 64 received at least one hijacked network. This likely indicates that the hijack spread towards most of the ASes of the Internet. As can be understood from the graph above (Fig1), it took about one hour to see most (but not all) of the hijacked routes removed from the collector, and hence from the Internet.

Note that two of the providers of AS55410 did not have a good filtering mechanism in place to prevent such an event. This contributed to the spread of the attack.

In the graph below (Fig 2), we focused our analysis on the evolution of hijacked networks by breaking it down by each provider of AS55410. In other words, we looked at the number of hijacked networks with an AS path that ended with ASprovider 55410.

Fig 2 300x152 A Cautionary Tale: Why Global Routing Systems Must Implement MANRS

Figure 2

A Few Unusual Observations

There were a few unusual facts about the situation. For instance, three providers managed to avoid spreading the event. How did this happen?

Marco Marzetti, a member of the peering team at PCCW (AS3491), explained to us that they have several defense mechanisms in place on their routers, including accepting only those routes belonging to a selected list of prefixes and dropping RPKI invalids. Most likely, the first mechanism was the most effective, since only ~20% of the prefixes involved signed a ROA and showed as RPKI invalid.

Another interesting item discovered is that 1062 networks belonging to 168 ASes are still seen by some peers of RIS as hijacked by AS55410. Of these 1062 networks, 991 are seen as announced via Bharti (AS9498), while the missing 71 networks are seen as announced only by CW Vodafone (AS1273). This could be a case of ghost routes, but most likely the network operators of AS55410 still have something to fix on their routing with these two providers.

Incident Impact 

We also took a look at how the incident impacted our customers. We used Catchpoint’s BGP tools to analyze our customer networks using our dedicated BGP infrastructure along with RIPE NCC RIS and University of Oregon Route Views Project.

By focusing on one of the affected networks during the incident time span, we discovered that peers from of all our data sources were impacted. Analyzing the public route collectors, we found that 234 of the peers of RIS and 162 of the peers of Route Views were seeing the network as hijacked. This involved peers from all over the world.

How To Avoid The Problem Moving Forward?

This incident was caused by AS55410. However, it is important to note that this kind of event could have been mitigated if all the providers of AS55410 were applying proper defense mechanisms on their routers as outlined by Mutually Agreed Norms for Routing Security (MANRS).

Most likely, if they were dropping RPKI invalid routes and if each of the affected organizations were signing their network resources in RPKI, this event would have never occurred. This incident was a reminder to network operators that they must implement MANRS to deal with such routing security threats.

About the Author

Leo Vasiliou leads the product marketing efforts with a passion for data analysis as applied to monitoring and performance data. With more than ten years of experience leading production operations, web performance and security programs for Ask.com and Ask Jeeves, Leo helps communicate the value of availability, performance and reliability of websites, webapps and other internet services. He started his career in IT infrastructure and Operations in top secret intelligence training facilities in The United States Air Force.

 

DATA and ANALYTICS 

Sorry, the comment form is closed at this time.

ADVERTISEMENT

Gartner