What CrowdStrike Teaches Us About Supply Chain Management
Table Of Contents:
This summer’s global IT outage, which took down millions of computers and grounded airlines worldwide, demonstrated just how difficult it is to manage a modern digital supply chain. Can effective customer risk management help plug the gaps?
On July 19, reports began emerging of a massive outage in Windows systems spreading across multiple industries. Endpoints were down and failed to reboot properly, leading to the most significant IT outage in history. The problem stemmed from a logic error in an update to CrowdStrike’s security software, which downed 8.5 million systems, according to Microsoft. Recovery was a complex, manual process. Along with the cancelled flights, banking customers and healthcare patients also suffered as the outage affected financial transactions and even delayed surgical procedures.
“The recent incident reinforces how growing reliance on interconnected IT systems has expanded the risk surface,” said Mark E. Green, chair of the House Homeland Security Committee, who will be hearing testimony from CrowdStrike on September 24th. It’s harder than ever to manage supply chain risk, given not just the size of those chains but the complexity of the software products and processes that they provide.
“Told You So” Says the GAO
The US Government Accountability Office doesn’t like to say it told you so, but would also like to remind you that it did. In a blog post after the incident, it pointed out strong similarities between this unintentional error and the cyber attack that compromised SolarWinds and its customers in 2020.
Since May 2010, the GAO has made 1610 recommendations spanning four cybersecurity challenge areas. One of these—establishing a comprehensive cybersecurity strategy and performing effective oversight—is the primary area of focus for supply chain risk. In a June report on government efforts to mitigate cyber risk, the GAO said agencies have failed to implement 43% of the recommendations in this area.
Managing A Consolidated Supply Chain
“The federal government needs to take actions to perform effective oversight, including monitoring the global supply chain,” the GAO added in its report. But how do you implement effective oversight on a powerful company that controls the lion’s share of the market?
Dan Geer, who contributed to the X Windows system and Kerberos and is now a senior fellow at In-Q-Tel, famously explored how technology monocultures affect security in his 2003 report CyberInsecurity: The Cost of Monopoly. It hit enough nerves in corporate tech that he was fired the day after its publication.
21 years later and a week after the CrowdStrike outage, he reminded us of the problem:
“We know that protective redundancy does not just happen, and we know that a jillion devices all alike offers no protection at all but rather the opposite,” he said. “We know that the wellspring of risk is dependence, and we know that aggregate risk is proportional to aggregate dependence.”
As Geer points out, market consolidation that produces jillions of devices all alike is an economic trend. Windows has over 70% of the desktop OS market, and two companies—Microsoft and CrowdStrike—own 44% of the endpoint protection market between them.
This trend hasn’t reverted since 2003. Neither will it revert in the future. There are many reasons why companies choose the same software as their peers, ranging from availability (the number of available solutions consolidates over time) through to risk aversion (no one ever got fired for buying IBM) and manageability (it’s easier to manage and support a fleet of a thousand windows machines than a panoply of different endpoint operating systems).
Doubtless, large companies do their best to follow cybersecurity best practices, but mistakes occur. The US government’s Cyber Safety Review Board found that Microsoft’s security culture was “inadequate and requires an overhaul” after cybercriminals hacked its systems and compromised a cryptographic key that gave them access to senior executives’ accounts. CrowdStrike, while suitably self-effacing as long as it doesn’t cost too much, released a buggy update that it failed to catch.
Microsoft and CrowdStrike held a closed-door meeting on September 10 to discuss how they can avoid this kind of thing happening again. They discussed measures such as relying less on kernel mode, where buggy software can cause the extreme damage seen in July. As CrowdStrike explains, that requires some work on Microsoft’s part.
CrowdStrike also said that it would now follow other measures, such as using canary releases—a basic best practice for scaled deployment—to limit damage to a smaller number of machines.
While it’s laudable that such large companies are learning these lessons now, it’s concerning that they’re doing so on their customers’ dime.
What Customers Can Do About It
It’s important to follow supplier management controls such as those in ISO 27001 Annex A 5.22, but as ISMS.online says, it’s important to be pragmatic about how much influence you can have on a large supplier.
Nevertheless, ISO 27001 has much to offer regarding incident response preparedness. It begins with planning for risk through thorough assessment, as outlined in ISO 27001 clause 6. It also provides valuable guidance in Control 5.30, which prepares organisations for business continuity in the event of a problem.
These practices might not protect a company from a major incident upstream in the supply chain, but they can help to minimise the impact of such events downstream, helping to maintain services for customers and business partners.