When Routine Updates Go Wrong: Lessons from the CrowdStrike Incident

Sep 6, 2024Press Hits0 comments

This article made public through BridgeView’s PR Services, details the severe disruption that occurred on July 19, 2024, when Delta Airlines and other major airlines and global organizations experienced unprecedented system outages. A faulty software update from CrowdStrike triggered a massive IT failure, affecting critical infrastructure worldwide. The incident serves as a stark reminder of the importance of rigorous testing and cybersecurity measures, even for routine updates, to prevent such catastrophic disruptions in the future.

Initially posted in Enterprise Security Tech.

On July 19, 2024, Delta Airlines’ systems were disrupted for more than six days, causing widespread delays, cancellations, and seas of stranded passengers. Flights worldwide were affected as the computers on which these services rely were disrupted. Three of the largest airlines in the United States—American Airlines, Delta Air Lines, and United Airlines—were grounded.

The IT outage resulted from a faulty software update initiated by CrowdStrike. The defective update caused a meltdown within the Windows operating system. Not just airlines were affected; organizations from banks to retail and law enforcement worldwide were impacted. McDonald’s closed some of its stores in Japan; Starbucks locations closed after the company’s mobile ordering system went down; Alaska State Troopers and the New Hampshire Department of Safety reported shutdowns.

It was a routine software update that caused all this chaos. A defect in one of CrowdStrike’s updates for computers running the Windows operating system caused the catastrophe, arguably one of history’s most significant information technology outages. This incident serves as a stark reminder of the potential risks associated with routine software updates. CrowdStrike’s updates are routine, but an error in its code conflicted with Windows and proved catastrophic. The software is so deeply integrated with the operating system that when the new update was installed, it caused computers running Windows to freeze.

From smartphones to laptops, automatic software updates are common, and we unquestioningly trust them as the necessary process to ensure app performance and device security. However, when it comes to an organization’s critical IT infrastructure, blind software execution should not be applied. When companies administer software updates, they should not bypass the traditional development and testing cycle to determine how any new code will react on the network. Aside from grounding devices to a halt, untested code applied to the network can open attack vectors that may not be found for many months.

Although there is no proof (yet) that the CrowdStrike failure was due to a hack or malicious actions, it’s important to remember that when things go wrong, bad actors will try to exploit the situation. Case in point: immediately after this incident, a malicious file claiming to be a quick fix circulated, but the so-called “CrowdStrike hotfix” was malware.

This is the first time a trusted provider has been the source of a widespread outage of this magnitude, and employees must be prepared—and understand—the steps to take when this happens. Continuous cybersecurity training for employees at every level is not just crucial, it’s a proactive measure. And it is essential to start this training now; don’t wait for the next incident. With this CrowdStrike disaster, getting machines back up and running was manual and laborious because servers that would have deployed a fix were also down.

With the increase in remote work, fixes are more complicated. In some cases, IT personnel had to go door to door fixing machines one at a time. Some employees were instructed to bring their laptops to a central location, and sometimes, IT had to walk people through the fixes over the phone. There were reports of flash drives with scripts to implement a fix being mailed nationwide. The CrowdStrike incident pushed us back decades in terms of implementing fixes.

Although the CrowdStrike incident is a faulty code issue, this situation underscores the gravity that a worldwide hacking outage can immediately cause. Therefore, having a 24/7 security operations center (SOC) is not just crucial, it’s a necessity for providing strategic insight and assistance. Not only will it identify anomalies immediately, but it can also help isolate the problem to prevent further penetration. A good SOC will sound the alarm, initiate actions to “stop the bleed,” and bring in resources for mitigation. Perhaps the best lesson learned from this blue screen of death disaster is to apply standard IT security checks to software updates—even if it’s a fully trusted partner.

Chris Snyder is a cybersecurity expert and Principal Sales Engineer at Quadrant Security, having honed his skills as a Systems Administrator, Threat Analyst, and Paratrooper Infantryman in the US Army. Chris leverages his diverse background and cybersecurity knowledge to help clients find the best security solutions for their unique needs.

Blogs

Latest Blogs

We’ve designed a culture that allows our stewards to assimilate with our clients and bring the best of who we are to your business. Our culture drives our – and more importantly – your success.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *