top of page

Disaster Recovery ”More Than Just a Plan”



By

Michael Lay CBCP – Principal Consultant


Growing up in the BC ERA (Before Computers) I have had both a blessing and a curse of working at a company during the dawn of the introduction of computers to business processes. We had the experience of failed processors, hanging memory, failed data transmissions and faulty data backups all resulting in a pile of rework to recover our process results. Today, the majority of businesses establish goals and objectives around creating a Disaster Recovery (DR) plan to avoid all the rework to recover. Whether driven by SOX, HIPPA, PCI compliance, external auditors or other initiatives, the general comment is, “we need to have a DR plan.” Conventional wisdom suggests that, to recover from a disaster event, we need a plan. However, to create a DR plan, you must have some type of approach or strategy to focus resources and planning efforts. Trying to articulate the approach or strategy is where most efforts start to lose traction or shift to focus completely on technology recovery, losing sight of the business operations. But that is a subject for another day.

There are many approaches to outline a DR Plan (DRP) but the best industry accepted guidance can be found at NIST Special Publication 800-34 or ISO 24762 standards which outline a framework for a DR Plan. These framework outlines cover many aspects for recovery including policies, objectives, key personnel, emergency response, communications, exercising the DRP and maintenance of the DRP. But that is just the beginning, having a DRP would bring a company to recovery level 1 capabilities. Level 1 capabilities are a great beginning for a recovery posture, but that is all it is, a beginning. There are 5 levels of recovery capabilities.


Level One – The basics

The basic DRP has these component to be aligned with industry best practices:


Frankly most organizations stop at Level One DR Capabilities for many reasons:

  • We got our check in the box to meet our regulatory requirements. Why do we need any more effort?

  • That is all we will every need, the chances of having a disaster are slim to none. We will deal with it if something happens.

  • We do not have the staff to do more. It will cost too much.

Most of these decisions are financial / bottom line-based decisions and are only explored if there are extenuating circumstances that could have material financial implications or concerns of personal safety issues.


Level Two – The next steps

What typically drives organizations to explore Level Two capabilities is when external audit firms conducting their due diligence processes. Having personally worked with Price Waterhouse, Ernst and Young and Deloitte and Touché, they all want to know that your data is protected and can be recovered. So, Level Two is about data protection/recovery.

There are many mediums for data collection, storage, backups and retention. Historically, we have backed up to tape or disk, replicated data to offsite storage centers and it is now prevalent to back data up to the cloud for protection and retention.

Level Two capabilities encompasses the creation of a plan to identify all requirements for data retention requirements/protection coupled with the backup directives that initiate the process to protect the required data. These backup directives should be periodically tested and demonstrate the physical data recovery. This plan should provide an annual schedule for the data recovery testing and document the demonstrated recovery. This plan should provide for the testing of new backup directives when they are created and added to the annual schedule.


Level Three – Recovery Strategy

Level Three recovery capabilities are built from the result of the Business Impact Analysis (BIA). The BIA results provide a total inventory of the application/tools that are required for all business processes. During this stage, IT team members outline the recovery strategy for each application/tool inventoried during the BIA. This outline would document the approach (high level, not a run book) that would be needed to recover the application/tools. This would be like a table top exercise documented as the best strategy or approach for application recovery.


Level Four – Recovery Time Capabilities

During the BIA we define the business requirements for Recovery Time Objective (RTO) and Recovery Point Objective (RPO) which are included in the Business Continuity Plan and Disaster Recovery Plan. During this phase, IT must determine the Recovery Time Capabilities (RTC) for each application. RTC is defined as the collective time for provisioning of the platform for application operations plus the time for the restoration of the data from the latest RPO required to return an application to normal operations. Level Four recovery capabilities are built from the results of the Level Three strategy sessions and is a good point to build the recovery run books for system restoration while recording time for the RTC. This phase can usually be conducted in a lab which mirrors the primary operational domain for the best results.


Level Five – Total Operational Recovery

Few organizations will ever achieve this level of recovery capabilities. However, it is necessary for some organizations where high-volume financial transactions take place, like a trading desk or personal health & safety can be impacted, like public transportation or a hospital system. Level Five capabilities are demonstrated by totally restoring your entire computing environment to an alternate facility and testing normal business operations. Difficult, but achievable with enough planning.


Where Do You Start?

The Disaster Recovery Institute International (DRII) has developed a comprehensive approach, creating a methodology that focuses on Business Continuity Management (BCM). BCM is a program that introduces industry accepted terminology and defines processes and structured results that incorporate industry best practices focused on maintaining business continuity. The objective of BCM is to ensure an organization is resilient to potential threats or unplanned events that affect normal business operations. This objective is coupled with a strategy that is developed to reduce the impact of a threat or processes/procedures to recover from unplanned events that cannot be controlled or mitigated.

The DRII has developed professional practices that are focused on creating, implementing, and maintaining a formal BCM program that is outlined on the www.drii.org website and detailed in the professional practice subject area overview below.


Professional Practice Subject Area Overview (taken from the DRII website)

1. Program Initiation and Management

Establish the need for a Business Continuity Management Program within the entity and identify the program components from understanding the entity's risks and vulnerabilities through development of resilience strategies and response, restoration and recovery plans. The objectives of this professional practice are to obtain the entity's support and funding and to build the organizational framework to develop the BCM program.


2. Risk Evaluation and Control

The objective of this professional practice is to identify the risks/threats and vulnerabilities that are both inherent and acquired which can adversely affect the entity and its resources or impact the entity's image. Once identified, threats and vulnerabilities will be assessed as to the likelihood that they would occur and the potential level of impact that would result. The entity can then focus on high probability and high impact events to identify where controls, mitigations or management processes are non-existent, weak or ineffective. This evaluation results in recommendations from the BCM Program for additional controls, mitigations or processes to be implemented to increase the entity's resiliency from the most commonly occurring and/or highest impact events.


3. Business Impact Analysis

During the activities of this professional practice, the entity identifies the likely and potential impacts from events on the entity or its processes and the criteria that will be used to quantify and qualify such impacts. The criteria to measure and assess the financial, customer, regulatory and/or reputational impacts must be defined, accepted, and then used consistently throughout the entity to define the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) for each of the entity's processes. The result of this analysis is to identify time sensitive processes and the requirements to recover them in the timeframe that is acceptable to the entity.


4. Business Continuity Strategies

The data that was collected during the BIA and Risk Evaluation is used in this professional practice to identify available continuity and recovery strategies for the entity's operations and technology. Recommended strategies must be approved and funded and must meet both the recovery time and recovery point objectives identified in the BIA. A cost benefit analysis is performed on the recommended strategies to align the cost of implementing the strategy against the assets at risk.


5. Emergency Response and Operations

This professional practice defines the requirements to develop and implement the entity's plan for response to emergency situations that may impact the safety of the entity's employees, visitors or other assets. The emergency response plan documents how the entity will respond to emergencies in a coordinated, timely and effective manner to address life safety and stabilization of emergency situations until the arrival of trained or external first responders.


6. Plan Implementation and Documentation

The Business Continuity Plan is a set of documented processes and procedures which will enable the entity to continue or recover time sensitive processes to the minimum acceptable level within the timeframe acceptable to the entity. In this phase of the Business Continuity Management Program, the relevant teams design, develop, and implement the continuity strategies approved by the entity and document the recovery plans to be used in response to an incident or event.


7. Awareness and Training Programs

In this professional practice, a program is developed and implemented to establish and maintain corporate awareness about BCM and to train the entity's staff so that they are prepared to respond during an event.



8. Business Continuity Plan Exercise, Audit and Maintenance

The goal of this professional practice is to establish an exercise, testing, maintenance and audit program. To continue to be effective, a BCM Program must implement a regular exercise schedule to establish confidence in a predictable and repeatable performance of recovery activities throughout the organization. As part of the change management program, the tracking and documentation of these activities provides an evaluation of the on-going state of readiness and allows for continuous improvement of recovery capabilities and ensures that plans remain current and relevant. Establishing an audit process will validate the plans are complete, accurate and in compliance with organizational goals and industry standards as appropriate.


9. Crisis Communications

This professional practice provides the framework to identify, develop, communicate, and exercise a crisis communications plan. A Crisis Communications plan addresses the need for effective and timely communication between the entity and all the stakeholders impacted or involved during the response and recovery efforts.


10. Coordination with External Agencies

This professional practice defines the need to establish policies and procedures to coordinate response, continuity and recovery activities with external agencies at the local, regional and national levels while ensuring compliance with applicable statutes and regulations.


These practice areas are not in a formally defined order; the most effective order will depend on where your organization is in the process of developing a business continuity strategy. This strategy will be the focus for the development of a DR plan that is aligned with business continuity requirements. Most organizations might begin with a Risk Assessment (RA) and a formalized Business Impact Analysis (BIA) to formally capture business operational requirements. Focusing a business continuity strategy on these business defined requirements will foster a more robust and cost-effective DR strategy and plan.

Depending on your organizational size, it may be prudent to engage Certified Business Continuity Professionals, like Core Insights, which are well versed in these subject areas to develop the most appropriate Business Continuity Management strategy and assist in developing the most cost-effective DR plan to support your operational capabilities.


Remember, it is never too late to prepare for an unplanned event….until it happens.


bottom of page