The criteria for operational resiliency provide the foundation for building
and maintaining a financial institutions business continuity program. Traditionally
business continuity planning included the recovery of business functions and
their supporting IT infrastructure. The demand for 7 * 24 high availability
services requires a change in the way we define business continuity. The new
paradigm uses an operational resiliency model to define the business continuity
framework supporting day-to-day operations. The traditional disaster recovery
plans become a subset of the new business continuity process.
Millions of dollars are spent each year adding technology to ensure that financial institutions can maintain continuous operations. The belief is that you can “fix it with technology”. At least that is what technology providers would like you to believe. While technology is important there are many other key factors for success that are needed before financial organizations are safe from unanticipated operational disruptions.
Most operational disruptions can be prevented, and the rest mitigated to reduce their impact. The key is how an organization prepares itself and their partners to prevent unknown interruptions. The solution appears complex because too many try to solve the problem at the wrong end of the process. Much like Total Quality Management and the Six Sigma Quality Process, operational resiliency has to be designed into every critical step and threaded throughout the organization at each critical hand-off point.
Management can play an important role at the outset of every new project by requiring that an operational resiliency strategy is defined and that funding is sufficient to ensure that the procedures, infrastructure, and planning are in place to meet the recovery time objectives for each critical process. Collaboration between organizations is also essential to define the interdependencies between people, processes, and technology. These relationships can be mapped by “following the data” through each process needed to deliver internal and external customer products and services.
Not all organizations have the resources or knowledge to assess risks, mitigate their impact, and develop business continuity plans for operational resiliency and disaster recovery. The scope of managing this initiative is substantial, especially as there are many interdependent processes that can affect the success of each organization.
The Operational Resiliency Process
The Operational Resiliency Process consists of the five phases: Resiliency Management; Operational Resiliency Assessments; Resiliency Strategies; Business Continuity Planning; and Validation. Each phase is designed to leverage the knowledge realized from the previous phase to support the overall Operational Resiliency Process.
Resiliency Management – This is the most important aspect of the Operational Resiliency Process. It provides the glue which allows financial organizations to meet their operational resiliency objectives while maintaining continuous uninterrupted operations. If not managed, each organization will develop plans that do not support the universal needs of the organization. These plans are often in different formats, and do not interface with one another.
The success of operational resiliency starts at the Board of Directors and senior management. Only they can ensure that the proper funding, direction, standards, and resources are available to implement a realistic continuity of operations program. Once the objectives are established, specialized expertise is needed to lay the framework for the entire company. This expertise is rare within financial organizations, and often requires unbiased third party expertise. Caution should be exercised when using third parties who are selling hardware or recovery services. Recommendations will include their “must have” products or services and ultimately will cost more than if business continuity and disaster recovery was part of the overall operational resiliency plan supporting day to day operations.
The management of resiliency information is a formidable task even when all other aspects of the program are working effectively. This requires expert continuity management software to assist with mitigating, and planning an overall integrated solution that can be securely accessed by an unlimited number of users, and protected against unforeseen interruptions. The software should be the repository for or have direct access to all critical information needed for rapid response, notification and recovery of all key processes for each financial institution. The protection of the continuity management software and its data should be given the highest priority and operated from an off-site facility with replication to guarantee 100% availability.
Operational Resiliency Assessment - One approach to understanding an organization’s operational resiliency is to perform a threats and vulnerabilities assessment followed by an operational impact analysis. The analysis should determine if the business recovery time requirements for each critical application can be achieved by the technology infrastructure and operational procedures supporting each business function. The analysis should compare industry best practices from ITIL, CORIT, NSA, and others against each organizational business and IT function. The result should provide a GAP analysis for critical risk areas including facilities, business processes, security, networking, communications, applications, data center operations, storage management, disaster preparedness, and process and documentation management. The analysis should provide strategic and tactical improvement recommendations for all risk areas that jeopardize operational resiliency.
Resiliency Strategies – Each business function should have resiliency strategies that define each critical process, its criteria for operational continuity, and the dependent staff and IT resources needed to maintain their required continuity level. This also includes the resiliency preparedness of internal and external organizations which provide resources and data needed to deliver customer services.
Resiliency Strategies should typically define why, who, what, when, and where so previously identified risks and events can be mitigated. This translates to: why there is an interruption; who is responsible for responding; what is it they have to do; when do they have to respond; and where do they have to respond to. This may appear overly simplified, and it is, but it is a great place to start because most organizations have not defined their critical processes from an operational resiliency perspective.
Business Continuity Planning – The Operational Resiliency Assessment and the Resiliency Strategies provide the basis for the Business Continuity Plan. Each business organization should define their critical resources, including critical applications and their recovery time objectives, as well as the need for data protection supporting each critical process. These criteria establish the priorities for IT to ensure that the supporting infrastructures are designed and maintained to comply with each business requirement.
The Business Continuity Planning performed by IT, commonly referred to as Disaster Recovery Planning, is often performed separately and not integrated with business resiliency planning initiatives. The separation of the business and IT initiatives for business continuity and disaster recovery planning is frequently the source of unsuccessful recovery initiatives. The lack of joint planning and collaboration often result in separate agendas that don’t coincide until annual testing uncovers the need for change. Operational resiliency fixes are then glued in at the back-end of the process with limited success.
An effective business continuity plan includes all aspects of operational resiliency for each critical process for every operational condition including catastrophic interruptions. The planning should include business and IT operations and cover a broad scope of topics including; emergency management, disaster assessment, recovery, testing, and business resumption. Each functional area needs to define their recovery teams and tasks supporting each phase of the recovery process. Operational process documentation needs to be available and priorities established to minimize the impact of any unplanned event.
Validation – Once the business continuity plan is formalized it is then used for training and testing, and hopefully not needed to support an actual event. During the training process, all members of the organization should be trained on its use and have an opportunity to review content from an operational resiliency perspective. After the feedback is received, testing scenarios need to be developed to ensure their recovery time objectives can be achieved.
During the Validation Phase every recovery process should be tested, and exceptions noted. There are many variations during the Validation Phase, but it is important that each phase is successfully completed, each business function is tested, and the data is recovered without loss of critical information. If any portion of the testing fails, it needs to be rerun until success is achieved.
The Challenge for Success
Operational resiliency assurance is complicated. It requires a systematic approach and discipline similar to Total Quality Management where the product is the delivery of uninterruptible financial services. Too many organizations try and solve the problem with reactive fixes as opposed to integrating operational resiliency attributes into every aspect of each critical process. The interdependencies between key functions need to be well understood, and the supporting IT infrastructure has to be capable of delivering to the stated business objectives. Success will come slowly, but it requires teamwork, expertise, process standardization and management’s involvement.
About the Author
Bob Burns, Chairman, CEO and Co-founder of EverGreen Data Continuity, is a strong advocate of the Operational Resiliency Process as the framework for Business Continuity Management. Bob is the founder of NetVault storage management software, and was Vice President of CommVault Systems after serving more than 30 years at AT&T Bell Labs managing IT Operations and serving as an AT&T National Baldrige Quality Award Auditor.