For some reason, your data center just went down. It’s time to declare a Disaster. Or is it? Many individuals within an organization, most of whom are unfamiliar with Disaster Recovery let alone the processes and policies you may have in place within your organization, are quick to decide that a Disaster should be declared. But, do they know what has actually occurred within the data center and its overall impact on bringing your infrastructure and systems back online?
All too often, individuals who do not have a full understanding of events, processes, or even the makeup of your colocation will proclaim to know what is best for the organization. Let’s say you just had an Emergency Power Off (EPO) event that took down everything within your environment, from networking, to storage, to your Citrix farm, to all servers and hence, all applications. Is it time to declare a disaster and restore at your colocation?
What if your colocation only has enough compute, RAM, and storage for recovery of your Critical applications? Still think it’s a good idea to restore at your colocation? What if the recovery time of your applications within the colocation is slated to take 3 days while your primary data center can be restored within 24 hours? Still a good idea to restore elsewhere?
When writing your DR Policy, it is usually a good practice to include a section on The Definition of a Disaster. In it, you will want to include such information as to the viability of your current data center. In the case of an EPO, is it better to attempt to recover your primary data center, or should you start restoring to your colocation? If the colocation provides failover capabilities, then that may be the best decision. However, if your colocation, as mentioned before, has a limited amount of resources and you won’t be able to restore all your applications, then maybe making sure that your primary data center can be recovered is the best solution.
Imagine having an EPO occurrence and you were only able to restore Critical applications within your colocation because of the environment. If a decision had been made to declare a disaster, you could have only had your Critical applications back up and running, maybe within a couple days. As soon as the Disaster was declared, your hosting team would have needed to order additional hardware to account for the other 70% of your applications in your environment. That could take days, weeks, or more likely months until you were fully recovered. At that point, you would have needed to begin some sort of replication back to your primary data center, now acting as your colocation. After you caught up, you might need to take another outage to restore back to your primary data center.
Imagine what everyone would be saying about the lack of Disaster Recovery planning on your part if this occurred. Therefore, it is vital to include a section that outlines exactly what the definition for a disaster is. If it is viability of the data center, spell out what that means. This could mean that you have redundant power and internet coming into the data center. You could talk about having the correct HVAC conditions in your data center. It could reference having enough compute power to bring up most of the systems. Either way, having this documented will help to ensure that the decision to declare a Disaster is made correctly, following the DR Policy and everything that you have worked.