Updated: The company's team made a sterling effort to boost resource and the site is working fine. Read on to get a feel for what fixing this kind of traffic surge feels like in some scenarios, from one CTO...
Popular real-time air traffic tracking website Flightradar24 has been forced to put even paying "gold" and "silver" subscribers into a lengthy queue -- with its servers overloaded Thursday amid a surge in use by open source intelligence (OSINT) and other users tracking aviation activity in and around Ukraine after Russia's invasion.
"Due to extremely heavy load, some users may experience slowness or temporary connection issues accessing Flightradar24. We're working on increasing available performance now" the company told users. Traffic on the site has gone from some three million users per day to a million users per hour in the space of a few hours.
The outage was the latest reminder that capacity planning and the ability to scale computing resources rapidly matters and continues to catch companies out -- even if the outbreak of war is not in every IT team's contingency plans. The company advertised for a site reliability engineer (SRE) just this week, noting that "when something exciting happens, such as when the pope or the Antonov An-225 are flying, our servers get pummeled."
Flightradar24 started as a hobby project in 2006 when two Swedish aviation geeks decided to build a network of ADS-B (airline location) receivers in Northern and Central Europe. In 2009 they opened up the network, and made it possible for anyone with an ADS-B receiver to upload data to the network. It boasts the largest ADS-B network in the world with over 30,000 connected receivers and has had over 50 million application downloads.
(It also operates a globally distributed network of a huge 30,000 Raspberry Pi-based radio receivers, that are dedicated to feeding real-time aircraft data into the Flightradar24 platform, it has earlier revealed.)
Previous job roles posted by the company suggest that it runs a mixed system infrastructure "based on a modern virtualized data center environment as well as AWS and Azure cloud services" -- with the SRE role suggesting that it was building a "future private cloud based on open technologies like OpenStack, Ceph, and KVM."
With Flightradar24 down (hugops) it's worth reflecting on scalability...
Asked to reflect on what it takes to respond to this kind of server-pummelled incident, one seasoned CTO, Steve Chambers, told us that "[if the bottleneck is on-prem] they’d need to re-assign existing servers or rack new ones. If they have them on hand. If they don’t then that’s not possible obvs so they have to borrow from somewhere else or wait. Then, do they have the processes to add capacity -- image boxes and add the apps, then update network to add them into the pool If tyou don’t exercise this muscle often, then it’s usually a broken pain in the ass."
"THEN they find that 'doubling the server capacity' leads to back end DB issues and they need to scale THAT layer out (usually adding read-only clones) and then that means updating the servers again to add new read-only clones to the app-db connection pool… it’s like a cascading config nightmare that on-prem struggles with, because it’s rarely well-architected for scaling up... it's very difficult to do right. If someone thinks 'just add some more VMs' then IF they have that capacity, ok, but you still need to do all the front end network (adding new VMs/hosts to a load balancer, FW, etc) and the back end DB (usually the bit that breaks)
He added: "It’s why serverless in cloud is great. You’ve actually architected your system and processes to scale often without interference. There are serverless databases (eg AWS DynamoDB) as well as the app layer. Containers can add more app instances quickly if they have architected the app that way..."
Flightradar24 outage came as the European Union Aviation Safety Agency (EASA) told airlines that "operators should not operate within [Ukrainian] airspace, including landing and departures from airports located in [its] airspace. Additionally, operators should exercise caution when operating in the whole FIR Moscow (UUWV) and FIR Rostov (URRV) due to heightened military activity which may include launches of mid-range missiles penetrating into controlled airspace... critical infrastructure, including airports, are exposed to military activities which result in safety risks for civil aircraft. In particular, there is a risk of both intentional targeting and misidentification of civil aircraft. The presence and possible use of a wide range of ground and airborne warfare systems poses a HIGH risk for civil flights operating at all altitudes and flight levels" EASA told its users.