One of the most important parts of a good DR (Disaster Recovery) plan is to understand the importance each application has within your organization. Only after you understand that can you begin to put each of the applications into Tier levels.

Tier levels are the various timeframes that a DR Manager creates to determine when certain applications should be recovered in the event of a disaster. Applications will have RTOs (Recovery Time Objectives) and RPOs (Recovery Point Objectives) determined by the business for every application in the landscape. However, to make all the applications recoverable within these RTOs and RPOs, the DR Manager needs to define attainable timeframes to accomplish this.

The first thing to make sure you understand is the definition of Critical within your organization. Critical applications are the ones that are needed within a certain timeframe and without them, the organization will fail.  The organization may not fail at that point in time, as there may be a buffer built in whereby applications are not required, but the RTO is a guideline to use when the business would prefer to have that application back.

So, with the information determined around Critical applications and a timeframe defined by upper level management, the DR Manager must then determine Tier levels for the applications that are deemed Critical. Let’s consider an organization that determines Critical applications need to be back within a 5-day timeframe. That would mean that any applications deemed Critical would have an RTO that falls within that 5-day window. If it does not fall within the window, it should not be considered Critical.

So, we have a 5-day window with which to restore all the Critical applications. Where do we go from here?  Should we just say that we will have everything back within those five days?  No.  The better way to do this is to break this 5-day window into shorter timeframes, or Tiers.  The first Tier would be for the most critical applications within your environment.  This could be a customer-facing application that is required to have orders placed, or it could be a claim system that allows claim adjustors to enter claim information for the customers.  It could be one of nay type of applications that is important to drive the business.

Either way, you have your most critical of applications in the first Tier.  What should the timeframe be on restoring the most critical applications in the organization? Generally, people would say that this should be hours. Well, depending on the type of environment you have, and if you utilize a colocation that does not have failover capabilities, then this timeframe could be anywhere from a couple hours to 24 or even 48 hours depending on your situation.

The main thing here is to make sure that only the most critical of applications within your environment are included in the first Tier level of Critical applications. I wrote in a previous article (The Argument for a DR Guarantee) how the DR Manager will want to look at what the business has done to allow for these applications to be considered Critical.  Are they up to date with regards to OS, database size, or even virtualization of the servers? If not, then you should work to make sure that these applications are not considered Tier 1 applications, or possibly even considered Critical.

Now, let’s consider that you determine that you really only want 3 Tier levels: Tier 1 (Extremely Critical); Tier 2 (Remainder of Critical); Tier 3 (Remainder of applications). If these are using the 5 day Critical window, that would mean that everything Extremely Critical would be recovered within that 24 hour timeframe that was determined by the business and yourself. Tier 2 would be anything needing to be recovered within 5 days.  And the remainder of the applications sometime after that, maybe 20 days. The problem with this is that people will say that they cannot wait for 5 days for their application and during the BIA (Business Impact Analysis) phase of Business Continuity, they will say that their application is Extremely Critical and has to be recovered within 24 hours.

While it is alright to have the majority of your Critical applications recovered within the first Tier’s timeframe, it is not a good solution to allow the business to move things up just because the other Tier’s timeframe does not work for them. My recommendation, and the way that I defined Tier levels was based on timeframes that made sense.

If I had 5 days to restore Critical applications in my colocation that did not have failover capabilities, I would define Tier 1 to be 24 hours.  Tier 2 would either be 48 or 72 hours, depending on how many applications would be “moved” to Tier 1 by the business if they felt that 72 hours was too much, but would be fine having them restored within 48 hours. If that was the case and I needed Tier 2 to be 48 hours, I might have a Tier 3 that was at 72 hours, and my final Critical Tier at 5 days.

As you can see, it is important for you, as the DR Manager, to take all of this into consideration when defining Tier levels. Obviously, everyone within the business wants every one of their applications recovered as soon as possible.  However, it is your job to ensure that not only can only are you keeping the business happy by aligning their applications to the importance within the organization, but you are also making sure that you have Tier levels defined that make sense to the business and are attainable by IT in the event of a true disaster.