How to manage disaster recovery in cloud computing
Post-Pandemic remote working became the new normal and cloud adoption has received an impetus. Naturally, organisations want to make sure their cloud infrastructure keeps working even if a disaster strikes. The disasters can be natural like floods, earthquakes, power outages, etc., or manmade like a given cloud provider’s data centre being offline due to their internal issues.
The acceleration in cloud adoption has triggered organisations to evaluate their Disaster Recovery (DR) policies through AWS, Azure, Google Cloud Platform (GCP), and other public clouds. Importantly, Business Continuity and Disaster Recovery strategies have to complement each other for the smooth functioning of organisations.
One of the critical aspects of business continuity for organisations is to be prepared for technical disasters. Whether hardware or software malfunctions, cyber-attacks, or natural disasters, it is important to be vigilant in safeguarding data.
Data loss can have a significant financial impact on businesses, as well as a negative effect on their reputation owing to a lack of client confidence. As a result, careful preparation and the construction of a roadmap for coping with future disasters are critical for limiting a company's long-term prospects.
A recent research report says, 54 percent of companies have experienced a prolonged downtime of one full working day in the past five years due to system failures. Moreover, the research also emphasises that prolonged downtime can result in a loss of $10,000 per hour for smaller businesses to more than $5 million per hour for enterprises.
This has triggered organisations to embrace a comprehensive Disaster Recovery Plan. Also, overall awareness amongst organisations to have a well-defined Disaster Recovery plan has manifolded in recent years.
What is cloud-based disaster recovery?
Cloud-based Disaster Recovery is a mechanism/solution that helps organisations retrieve critical systems post any disaster. It also allows remote access to servers and systems in a secured virtual environment. Most businesses devote 2-4 percent of their IT resources to disaster recovery planning, with some organisations devoting up to 25 percent of their IT expenditures to reducing infrastructure risks.
Here's a simple cloud disaster recovery plan illustrated in eight steps to assist your organisation in developing an efficient disaster recovery strategy:
1. Know your infrastructure and risks involved
All organisations need to assess their IT infrastructure consisting of assets, equipment, and data. It is also critical to determine where these assets, equipment, and data are stored and their net worth.
After evaluating the assets and the risks involved like natural disasters, data theft, and power outages (among others), the organizations can design the DR plan to minimise the effects of these disasters.
2. Conduct a business impact analysis
Business Impact Analysis helps to understand the limitations of an organisation’s business operations after the disaster strikes. The following two parameters play a significant role in assessing the situation; Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
RTO is the maximum amount of time an application can stay offline before the business operations are disrupted, RPO is the maximum amount of time an organization can bear data loss from an application due to a significant disaster.
3. Create a DR plan based on your RPO and RTO
After determining the RPO and RTO, the focus can be shifted towards designing a system that meets the organisation’s DR goals. To put the DR plan into action, an organization can choose from the following options:
Backup and Restore
Pilot Light Approach
All the aforementioned DR approaches can be explained by considering AWS cloud-based DR system.
- Backup and restore: It is a disaster recovery mechanism that works by periodically taking data backups. This approach relies on RPO. For instance, if your database has been frequently changing data, like energy consumption during peak hours then it needs higher RPO and a static database can be handled with lower RPO.
- Pilot light approach: This approach is often compared to the gas heater’s working analogy. A gas heater consists of a small flame that can ignite the entire furnace, similarly in the Pilot Light approach, the database server on the cloud is always kept activated for incremental backups like the small flame in the heater analogy. The application and caching server replica environments are kept in standby mode and can be compared to the entire furnace of the gas heater. When disaster strikes the application and caching servers are activated and through elastic IP addresses, users are rerouted to the ad hoc cloud environment.
- Warm standby approach: In this approach whenever an on-premises data center fails, multiple EC2 instances are employed to ensure application and cache environments are brought up to the production load. With the help of Amazon Route 53, the traffic is rerouted instantly with almost zero downtime.
- Multi-site approach: This is considered the optimum technique. When a disaster occurs all the traffic directed to the on-premise servers is rerouted to AWS cloud and multiple EC2 instances are used to handle full production capacity.
- Multi-cloud approach: In this method, we have a primary cloud provider and a backup cloud provider. For example, if AWS is primary, then Azure cloud can be for the DR. This ensures whenever a primary cloud is down, your systems can still function in the secondary cloud. Just a few days ago, AWS cloud was down for a few hours, and the services of many sites like Netflix were down. By having multi-cloud disaster recovery, one can solve these types of challenges.
4. Rely on the right cloud partner
Once a suitable DR plan has been considered, it is important to look for a trusted cloud service provider that will help in execution. Following are the factors to consider while choosing an ideal cloud service provider: Reliability, Speed of Recovery, Simplicity in Setup and Recovery, Scalability, and Security Compliance.
5. Ensure cloud DR infrastructure is in place
After consulting with a cloud disaster recovery partner, one may work with the provider to put the idea into action and build up the DR framework. For trouble-free business operations, the DR must comply with the RTO and RPO requirements.
6. Put your disaster recovery plan on paper
It is critical to establish a quality procedure or process flowchart with explicit instructions for everyone involved in disaster recovery. When a disaster strikes, each person should be prepared to assume responsibility for his or her role in the disaster recovery process.
7. Simulate real failures
You want to make sure that the disaster recovery plan can be executed when there is a real failure. In many cases, we realise that key components are missing only when the disaster really strikes, and we are unable to complete the plan. Thus many organizations simulate failure.
For example, many companies shut down all the servers in a given data center, and expect the disaster recovery plan to kick in, and handle the situation. If there are any issues, the servers can be brought back up, but they will get an understanding of any lacunae in the plan with such simulations.
8. Revisit and test your DR plan often
The next step for the organisation will be to test their DR plan to ensure there are no loopholes. Its reliability can be analysed only after testing.
Since organisations go through changes in terms of plans, people and management, it is essential to practice the DR plan after each change and be ready for any crisis.
Having the right cloud partner is one of the key responsibilities of every organisation to ensure best disaster recovery practices are implemented.
(Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the views of YourStory.)