Surviving the unpredictable: Resilient data-center strategies for business continuity

By Kevin Reed, CISO, Acronis

As I write this, the Singapore Parliament hears answers from the Monetary Authority of Singapore to Parliament inquiry about the recent incidents with Singapore banks that affected more than 2.5 million transactions. The incidents were caused by a datacenter cooling system malfunction, which casts doubt over datacenters reliability.

Kevin Reed

Why did the cooling system fail? Was it a design issue? Were they poorly serviced tech stacks?
The service provider blamed the contractor and human error during the routine maintenance. There are no definitive answers, yet, but clearly, datacentres need to be more resilient. And how do we build resiliency? Well, there are three datacentre essentials that should help enterprises do exactly that.

Step 1: Perform a comprehensive risk assessment
Datacentre outages could be caused by mistakes in design and equipment installation, poorly performed servicing, operational issues, or even a cyberattack.

A good starting point to ensure the datacentre design is solid is to validate and certify the design and implementation by the Uptime Institute. The Uptime Institute’s Tier Classification System is a widely recognised and respected standard for evaluating the reliability and availability of datacentre. This system, created by the Uptime Institute, offers a framework for assessing and ranking datacentre based on their design and operational resilience.

A Tier I certification is the lowest level of reliability, while Tier IV datacentres offer the highest level of fault tolerance. A Tier IV DC is designed to withstand the most severe disruptions, ensuring near-continuous uptime. Organisations engaged in the practice of setting up a private cloud or hosting a large data repository will find nurturing a disaster recovery (DR) program extremely beneficial. In outages such as ransomware attacks, a DR should help bounce back within a few minutes. In the recent case in hospitals across Singapore, a design failover to another DC or DR could have helped reinstate services at faster speeds than the seven-hour delay.

Step 2: Implement robust operational protocol
Operational and servicing processes and procedures’ quality directly contribute to datacentre resilience. An example from Singapore shows, how a single human error could cause multi-million damages.

To ensure flawless execution of changes, maintenance, and implementation of robust operational protocols, two key factors need to be considered. They are employee training and testing and drills. Often the weight of compliance and following protocols can bring down the speed of enterprises.

But compliances can be one way to setting and even surpassing industry norms. In the world of cybersecurity, the more compliance, the better the level of security. Compliance with international data protection standards ensures that data transfer and storage practices are in accordance with the laws of the countries involved. Compliance with these norms helps fulfill additional safeguards such as encryption and data protection agreements, which in turn could improve the privacy of sensitive customer information.

The probability of a large-scale cyberattack in cases of DDoS or DC downtime should not be
overlooked. For instance, the colonial pipeline attack that disrupted supplies to the US southeast originated from a single compromised password. Also, Acronis mid-year 2023 cyberthreats report, clearly states that enterprises were losing business and money with over 400 victims’ data getting leaked by ransomware. With threat vectors raising the complexity of cyber threats, going to the extent of Artificial Intelligence, there is a need to scour the enterprise for fresh threats.

Step 3: Build zero trust & execute
Zero-trust approaches may seem farcical but are the only way to go. A zero-trust approach to mission-critical processes will be inadequate without solutions like endpoint data loss prevention (DLP). Endpoints such as laptops, smartphones, and tablets, are common points of access for data loss and espionage. Although device management tools have matured, enterprises need to scout for options that offer to monitor a package spanning across the network, cloud, and end-point devices.

To most industries, resilience still remains a destination rather than a journey. Despite the growing risks, datacentres are here to stay. There is a need to manage them efficiently – power, cooling, connectivity, operations, and security, to ensure that they are resilient. Finding such a solution may seem like a ludicrous proposition but given the need to be digitally present, a resilient framework is indispensable to the well-being of the enterprise and the industry.

Comments (0)
Add Comment