Site icon VentorTech

IT Disaster Recovery: Lessons from the CrowdStrike Incident

On July 19, 2024, an American cybersecurity company named CrowdStrike distributed a faulty update to its Falcon Sensor security software, which led to widespread system crashes affecting approximately 8.5 million Windows devices. This caused the cancellation of over 5,000 flights and impacted various sectors such as healthcare, banking, and retail. The recovery process involved multiple reboots and manual interventions, which were expected to take several days, underscoring the vulnerability of relying solely on cloud services for critical business operations while having no alternative means of conducting essential business operations.

Why prepare?

Insurance data shows that about 75 percent of businesses lack a Disaster Recovery Plan (DRP), or a set of steps for resuming critical operations as soon as possible after a disaster. Without a recovery plan in place, businesses can stand to lose thousands of dollars per hour during an extended outage, which often becomes unbearable, leading to the business collapsing.

There are many reasons why businesses delay creating a DRP or neglect to do it altogether, and those tend to land in the following two areas: 

Yet natural disasters, for example, may strike any place at any time and occur all too often. A recent natural disaster that caught businesses unprepared and resulted in massive losses was the severe convective storm activity in the United States during the first half of 2024. These storms led to significant economic impacts, causing insured losses of approximately $10 billion and overall economic losses of around $14 billion. The frequent occurrence of large hail and tornadoes was a primary driver of these costs. For instance, a widespread hail and tornado outbreak in mid-March alone resulted in estimated losses of $3.4 billion (Digital Insurance) (Insurance Journal). The storms caused power outages and physical damage to data centers, disrupting IT services and leading to downtime for businesses dependent on these facilities. They damaged telecommunications infrastructure, including cell towers and fiber optic cables, leading to widespread internet and communication outages affecting businesses’ ability to maintain online operations, communicate with clients, and support remote workforces.

Illusion of a Cloud Safety Net

While natural disasters continue to be a serious threat and offer many reasons for continuity planning, the CroudStrike incident reminds us of a relatively new risk factor – the cloud itself, which conversely is generally viewed as a panacea to keep businesses safe from disasters.

Man looking at the server with his back turned to us

Small businesses with minimal to no IT staff plus the growing number of medium and large businesses adopting Cloud ERP and related offerings are shifting the internal IT responsibilities to contracted cloud service providers. That might give those businesses a misguided sense of being well prepared for facing a disaster, where outsourcing technical responsibilities presents its own host of challenges.

Cloud service providers often become the single point of failure in and of itself. The CrowdStrike incident demonstrated how a single faulty update could cascade into a global crisis, affecting millions of devices and critical services. Businesses that rely exclusively on cloud services without robust failover capabilities or fail to have “plan B” procedures are at a higher risk of operational paralysis in such events. Cloud services also depend on continuous and reliable internet connectivity. Any disruption in network connectivity, whether due to technical issues, cyberattacks, or physical damages, can render cloud-based systems inaccessible.

Although cloud-based IT has admittedly drastically simplified technical infrastructure (for its users), the reduced technical risks are now supplemented by legal conundrums, for which the traditional IT team is ill-prepared. The “fine print” is often missed where Service Level Agreements (SLAs) with cloud providers include self-protective clauses, such as those that limit the provider’s liability in the event of a service disruption. The CrowdStrike incident highlighted this issue, as their liability for damages and lost revenue was minimal, despite the extensive impact on their clients’ global operations. This leaves businesses bearing the brunt of the financial and operational consequences caused by failures in what they had expected to be always available and error-free. At the end of the day, the main goal of the disaster recovery exercise is the same as having insurance – to minimize losses and stay afloat.

The changing nature of DRP/BCP bears on ERP deployment choices

Business continuity risk assessments must identify potential vulnerabilities in relying on cloud services. This includes evaluating the robustness of the provider’s disaster recovery plans, the geographical distribution of data centers, the resilience of the network infrastructure, and (last but not least) attending contractual “fine print” to make sure that the “insurance” is actually designed to work as expected. 

Clearly, Business Continuity Planning (BCP) goes above and beyond mere data and information systems recovery and demands tight participation of businesses in IT disaster recovery planning. Tangible preparation means more than having trained personnel sitting somewhere remotely with data or relevant applications, just waiting to fulfill a promise of being available at critical moments. In all, without a well-coordinated plan that acknowledges risks and targets prioritized recovery of specific business functions, a disaster recovery effort is going to be, at best, chaotic and unpredictably poor.

With such a scenario in mind, businesses might need to consider diversified strategies that include on-premises solutions or, depending on the risks and costs assessment, on-premises solutions only.

This, of course, applies to Enterprise Resource Planning (ERP) systems that are central to many businesses’ operations. Deploying ERP systems on-premises can offer greater control over IT disaster recovery processes. Internal IT teams can manage and customize their DRP more predictably, tailoring solutions to the specific needs of the organization without relying on the cloud service provider’s disaster recovery capabilities. For example, internal IT teams can implement backup strategies that include frequent snapshots, offsite backups, and redundancy measures tailored to the specific data recovery objectives of the organization, allowing for a more controlled and reliable execution, thus also reducing downtime and minimizing impact. This flexibility is often not available with cloud-based ERP, where backup processes are standardized and controlled by the service provider.

The transition to cloud-based solutions is a broad trend in enterprise software, however, where vendors are shifting from offering on-premises applications to focusing exclusively on the SaaS (Software as a Service) model. SAP is just one notable example. While still supporting some on-premises solutions, SAP has been actively encouraging its customers to migrate to their cloud-based ERP (SAP S/4HANA® Cloud), emphasizing the benefits of continuous innovation and reduced infrastructure management responsibilities, while, of course, enjoying a predictable revenue flow of the subscription-based licensing model. A similar story applies but is not limited to the other three ERP giants: Oracle, Microsoft, and Infor.

Odoo’s hosting helps to optimize recovery planning

Odoo, on the other hand, offers flexible deployment options, including Odoo Online (SaaS), Odoo.sh (PaaS), and on-premises hosting, making it a strong choice for businesses aiming to achieve robust business continuity. 

infographic featuring three types of Odoo hosting: Odoo Online, Odoo SH and Odoo On-Premises
Types of Odoo Hosting

Its on-premises deployment option not only provides greater control over data security, customization, and compliance, essential for industries with stringent regulatory requirements but also enables internal IT teams to manage disaster recovery plans more predictably and effectively, reducing reliance on external factors.

Users can choose the best deployment architecture according to their needs and integrate various components via Odoo’s robust RESTful APIs and other integration tools. This flexibility allows businesses to host some parts of their Odoo solution on-premises while utilizing cloud services for others, depending on their specific requirements, thus achieving “hybrid deployment” with the ability to connect and synchronize data between those two setups. Odoo.sh provides development, staging, and production environments that can be synchronized with on-premises setups, thus supporting continuous integration and deployment workflows.

Odoo.sh also offers automated backups, which can be configured to mirror data between cloud and on-premises databases, ensuring that both environments have up-to-date information.

A typical hybrid example is when Manufacturing Execution Systems (MES), inventory management, and production scheduling are hosted on-premises to ensure real-time performance and data security, while CRM, sales, and e-commerce modules are deployed in the cloud, providing scalability and remote access for sales teams and customers.

A more detailed, easy-to-read overview of each of the three Odoo hosting options can be found in our “Types of Odoo Hosting” guide.

Conclusion

The July 2024 CrowdStrike incident serves as a stark reminder that while cloud services offer numerous benefits, they also introduce significant risks that cannot be ignored. Businesses must adopt comprehensive BCP and IT DRP strategies that account for potential failures of cloud service providers and other dependencies. Businesses can enhance their resilience and ensure continuity in the face of unforeseen disruptions by taking the following actions: diversifying redundancy, conducting thorough risk assessments, employing multi-provider strategies, investing in robust connectivity, and regularly training IT and other staff. Additionally, leveraging on-premises ERP systems (such as Odoo) can provide greater control, predictability, and security in disaster recovery planning, further strengthening a company’s overall disaster preparedness. The lessons from the CrowdStrike incident surely motivate IT leaders to re-evaluate their reliance on cloud services in general, and in particular, their disaster recovery framework.

Don’t let server disasters catch you off guard

Feel free to ask your questions in the comments section below, share this article across your network, and subscribe to our newsletter.

Recommended articles:

  1. Is there a future for Odoo, NetSuite, or SAP?
  2. Odoo hardware requirements
  3. Office 365 and Odoo.sh and Odoo Online. How to configure emails
  4. Do not migrate to Odoo 17 until you read this
  5. Odoo 17 Community vs Enterprise
Exit mobile version