QinetiQ Trusted Information Management
|
Date: | October 21, 2002 |
To: | Jennifer J. Johnson, Secretary, Board of Governors of the Federal Reserve System [Docket No. R-1128]
Office of the Comptroller of the Currency [Docket No. 02-13] Jonathan G. Katz, Secretary, Securities and Exchange Commission [File No. S7-32-02] |
From: | Carl B. Jackson, Vice President, QinetiQ Trusted Information Management, Inc. (QinetQ-TIM) |
RE: | Comments - Draft Interagency White Paper on Sound Practices to Strengthen the Resilience of the U. S. Financial System |
Ladies and Gentlemen:
QinetiQ Trusted Information Management appreciates this opportunity to comment on the "Interagency White Paper on Sound Practices to Strengthen the Resilience of the U. S. Financial System " (the "White Paper"). QinetiQ Trusted Information Management, Inc. ("QinetiQ-TIM") is a provider of information security and continuity planning professional services with international clients in the core clearing and settlements organization sector.
As a former commissioned National Bank Examiner with the OCC as well as the Continuity Planning Service Line Leader for a Big Four accounting firm, my management asked that I contribute comments on the White Paper. The comments below include a background description and tables that contain selected sections of the White Paper together with some additional background materials on QinetiQ-TIM and company management.
We appreciate the opportunity to present our views and are committed to working with the Agencies and the industry to reinforce strengths of the existing structure and to bring about changes that will benefit the industry and its participants. Should you have questions or comments, please feel free to contact Carl Jackson at 281-802-8206 or by email at cbjackson@qinetiq-tim.com.
The Federal Reserve, the Office of the Comptroller of the Currency, the Securities and Exchange Commission and the New York State Banking Department (the agencies) have been meeting with industry participants to analyze the lessons learned from the events of September 11, with a view towards strengthening the overall resilience of the U.S. financial system in the event of a wide-scale, regional disruption. Ensuring the resilience of critical financial markets requires that core clearing and settlement organizations and other firms that play significant roles in critical financial markets, many of which enjoy the benefits of operating out of major financial centers, will be able to perform their critical activities even in the event of a wide-scale, regional disruption.
Based on in-depth discussions with industry representatives, the agencies have reached certain conclusions regarding the necessity to assure the resilience of critical U.S. financial markets in the face of wide-scale, regional disruptions and identified a number of sound practices to strengthen the resiliency of the overall U.S. financial system and the respective U.S. financial centers. The paper discusses the views of the agencies on sound practices based on discussions with industry representatives on how the events surrounding September 11, 2001, have altered business recovery and resumption expectations for purposes of ensuring the resilience of the U.S. financial system and seeks comments on those views. Based on this extensive dialogue, the agencies have reached certain preliminary conclusions with respect to the factors affecting the resilience of critical markets and activities in the U.S. financial system; sound practices to strengthen financial system resilience; and an appropriate timetable for implementing these sound practices.
The Federal Reserve, the Office of the Comptroller of the Currency, and the Securities and Exchange Commission are publishing this draft white paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System for comment. The New York State Banking Department and the Federal Reserve Bank of New York also participated in drafting the paper. The agencies are seeking comment on the sound practices discussed below. Comments have been invited and are due to be received with 45 days of publication in the Federal Register. Comments are to be delivered to:
1) The Board of Governors of the Federal Reserve System: Please direct all comments concerning this paper to: Jennifer J. Johnson, Secretary, Board of Governors of the Federal Reserve System, 20th Street and Constitution Avenue, NW, Washington, D.C. 20551, or mailed electronically to regs.comments@federalreserve.gov. [Docket No R-1128]
2) OCC: Please direct all comments concerning this paper to: Office of the Comptroller of the Currency, 250 E Street, SW, Public Information Room, Mail Stop 1-5, Washington, DC 20219, Attention: Docket No. 02-13; fax number (202) 874-4448; or Internet address: regs.comments@occ.treas.gov. [Docket No. 02-13]
3) SEC: All comments concerning the paper should be submitted in triplicate to Jonathan G. Katz, Secretary, Securities and Exchange Commission, 450 5th Street, NW, Washington, DC 20549-0609. Comments can be submitted electronically at the following E-mail address: rule-comments@sec.gov. All comment letters should refer to File No S7-32-02; this file number should be included on the subject line if E-mail is used.
Overall Comments | |
|
We consider the Draft White Paper to be well thought out and executed. The overall scope and breadth of the document is considered appropriate and, in fact, we consider it a long overdue set of requirements for financial institutions' continuity planning requirements.
We have three additional comments that deal with the following: (1) Emphasizing the need for financial institutions to take a business process approach (as opposed to a technological focus) to continuity planning; (2) Definition and utilization of the term `time-critical' as opposed to `critical,' and; (3) Establishing an appropriate set of metrics to measure the long-term health and vitality of the institution's continuity planning business process. |
|
SUGGESTION: We suggest that financial institutions be encouraged to utilize business process models and mapping when prioritizing `time-critical' business processes and the resources that support them.
EXPLANATION: We consider it essential that the enterprise approach to continuity planning must be process based. That is to say that the methodological approach to continuity planning be business process (mega process, major process, sub-process) focused and include: (1) current state analysis of existing enterprise continuity planning components, business impact analysis, risk management reviews; (2) mapping time-critical business processes to support resources (i.e., IT infrastructure, communications networks, facilities, external partners, people, etc. that support the identified processes); (3) analysis of the most appropriate recovery alternatives given time-critical resource mapping; (4) continuity and crisis management plan development, and development of short- and long-term testing, maintenance, training, and measurement processes, and: (5) deployments of the planning and maintenance processes that were designed in (4) above. Failure to correctly identify, name, and prioritize business processes with an emphasis on focusing on those that are time-critical will lead to inefficiencies and eventual collapse of the overall continuity planning business process within the enterprise. Reference the Carl Jackson article for Auerbach, entitled "Reengineering the Continuity Planning Process" for more detail on this concept. |
|
SUGGESTION: We suggest changing the terminology in the Objective and Scope statement, as well as in several other places within the White Paper, the term `critical', the traditionally utilized term, to `time-critical.'
EXPLANATION: The concept of prioritizing `time-critical' business processes and the resources that support them, including IT infrastructure, communications networks (both voice and data), facilities, external partners (trading, vendor, customer, outsourcers, public, etc.) must be emphasized. Defining and using the term `time-critical' is very useful to those who are attempting to determine which parts of the enterprise should receive continuity planning attention, and in what order. The term `time-critical' can easily be differentiated from `mission critical' or simply `critical' functions. To illustrate, it can be said that all time-critical processes are mission-critical, but not all mission-critical processes are time-critical. Our experience is that between one-third and one-half of the business processes of an enterprise are truly time-critical. Narrowing the focus to time-critical processes and support resources streamlines the continuity planning process making it more efficient to develop, test, maintain, and measure in the long run. Focusing attention on `time-critical' versus simply `critical' processes can spell the difference in the long-term success of the continuity planning processes. |
|
SUGGESTION: We suggest that the Agencies emphasize the development of both quantitative and qualitative measurement processes to be deployed along with the continuity planning infrastructures.
EXPLANATION: The reality is that many executive management groups have difficulty understanding the overall value add of the continuity planning processes within their organizations. This has lead to the cyclical process exemplified by on-again, off-again continuity planning projects. What degree of value does continuity planning add to the enterprise people, processes, technology and mission? Great question. It is sometimes difficult to get beyond the financial justification barrier. There is no question that justification of investment in continuity plan business processes based upon financial criterion is important, but it is not usually the financial metrics that drive recovery time windows (recovery time objectives). Continuity planning process metrics must be both quantitative and qualitative. It is the `customer service and customer confidence' issues that drive short recovery timeframes, Short recovery timeframes are typically the most expensive to implement because of the resource commitments involved in securing short-term recovery capabilities. Financial measurements do not always support short recovery windows. Implementation of an appropriate measurement system is crucial to success. Companies must measure not only the financial metrics, but also how the continuity planning business process adds value to the organizations people, processes, technologies, and mission. These metrics must be both quantitative and qualitative. Focusing on financial measures alone has lead to the on-again, off-again planning referred to earlier, and does not take into consideration the business interruption (customer service or lack of confidence) impacts or a disruption. |
The following table outlines comments to specific questions within the White Paper:
Specific Comments | |
Summary Data | Comments |
The agencies invite comments on the appropriate scope and application of the sound practices and implementation timetable discussed above, as well as other issues relevant to strengthening the resilience of the financial system in the face of wide-scale regional disasters. In particular the agencies invite comment in the following areas: | |
Scope of application. | |
|
Our view is that the Agencies have included all relevant markets within the breadth of the White Paper. |
|
The definition of core clearing and settlement organizations is clear and should not require further explanation. |
|
Yes. |
|
While difficult to define in Agency guidelines, there will likely be a large number of facts and/or circumstances that will need to be used to determine whether a firm plays a significant role as a core clearing organization. This information will only be derived following an appropriately conducted Business Impact Assessment (BIA) for each core enterprise. As part of the time-critical process mapping to support resources, including mapping of support provided by external partners, numerous organizations and organizational components will be identified as being time-critical and will therefore cause that organization to fall under the definition of core clearing organization or as a supporter of such operations. |
|
It is our opinion that basing the benchmark as a percentage of market share makes the most sense. The percentage approach will also alleviate the Agencies from reissuing guidelines as market conditions change. |
|
At this point, we feel the benchmark should be applied uniformly across the industry group. |
In some market segments, there are geographic concentrations of primary and back-up facilities of firms with relatively small market shares. | |
|
Yes. Fortunately, the commercial hotsite vendors are geographically dispersed so that they can offer a minimum level of diversity for backup support for those smaller firms that are geographically concentrated. The eventual Agency Guideline may even cause the commercial vendors to diversify even more than they are presently. As an aside, the major firms in this industry group tend to be multi-national companies with several locations around the world where recovery operations could be organized. The issue will be, for them, the cost of planning for and acquiring appropriate backup resource support (i.e., communications circuits, hardware, personnel, facilities, etc.). |
|
This is a difficult question to answer by way of making sweeping generalizations. We feel that as the Agency Guideline goes into effect, many of the firms that play significant roles in critical markets will have to consider acquiring `hot' backup capabilities or consider shifting operations to partners, affiliates, etc., in the short-term following a significant disaster or disruption. Unfortunately, to accurately answer this question, much depends upon the BIA process that must take place within each firm. |
|
In order for the eventual Agency Guideline to be effective, we believe that firms that play significant roles in critical markets should be required to meet standards for continuity of operations. This will be an unpopular mandate, but this requirement really begins to get to the main point and reason for the Agency Guideline. |
|
Yes. |
|
No. For continuity planning purposes, a `disaster' is declared just as soon as it is determined that the resources that support time-critical processes will be `down' longer than the recovery time objective (RTO), as defined during the BIA process. Once it is determined that downtime will be longer than the RTO, then a `disaster' is declared and recovery activities and tasks are initiated. The anticipated length of the outage, beyond the RTO, is irrelevant. The focus should be on recovery of minimum time-critical operations within the recovery window and to continue to support those operations until the primary functionality is fully restored, no matter the length of time. |
|
Two answers here. Yes from a mega-process standpoint. But each of these mega-processes (referred to as critical activities) has a number of major and sub-processes. Some of these major and sub-processes are time-critical and should be subject to continuity planning, and some are not. It would be impossible for the Agencies to accurately identify every time-critical mega, major and sub-process. This can only be done as part of the BIA process within each firm. The Agencies should therefore require that a business process BIA be conducted in order to ensure that each firm has identified all the time-critical activities that support operations. |
|
Defining materiality is mandatory in understanding how best to prioritize activities for recovery. However, given the size differential of the firms involved as well as in the `mission' of the firms (some have corporate earnings goals, others may have different goals) it is tricky to set one criterion across the industry. Perhaps establishing a framework for determining materiality would be a better approach. |
Sound practice seems to require firms that play significant roles in critical markets to establish recovery targets of four hours after an event for their critical activities. | |
|
Yes. RTOs of four hours or less require companies to make more substantial investments in continuity planning arrangements (i.e., communications circuits, hardware, facilities, software, management systems, etc.). Automated operations mirrored processing using RAID technologies, failover processes, or even fully mirrored processing sites is really the only way to achieve less than four hour recoverability. This calls into question the requirement for `Continuous or High Availability' systems and processes.
Evolving with the birth of the web and web-based businesses is the requirement for 24x7 uptime. Traditional RTOs have disappeared for certain business processes and support resources that support the organizations' web-based infrastructure. Unfortunately, simply preparing web-based applications for sustained 24x7 uptime is not the only answer. There is no question that application availability issues must be addressed, but it is also important that reliability and availability of other web-based infrastructure components, such as computer hardware, web-based networks, database file systems, web servers, file and print servers as well as preparing for the physical, environmental, and information security concerns relative to each of these (See RMR above) be undertaken. One other point here, which is where non-automated operations (i.e., mail room, certain back-office activities, etc.) should also receive the same degree of care as those automated processes. One of the lessons of 9/11 was that many firms had prepared recovery plans for computerized processes, but neglected manual processes. |
Similarly, sound practice seems to require core clearing and settlement organizations to establish recovery and resumption targets of two hours for critical activities. | |
|
When considering recovery for automated applications and processes, RTOs of less than eight hours, that is to say RTOs from one hour up to eight hours, tend to require substantial effort to achieve appropriate recovery alternative solutions. So the answer to this question is yes. As with the firms that play significant roles in critical markets, the RTOs of two hours or less, likewise, require companies to make even more substantial investments in continuity planning arrangements (i.e., communications circuits, hardware, facilities, software, management systems, etc.). For automated operations mirrored processing using RAID technologies, failover processes, or even fully mirrored processing sites is really the only way to achieve less than four hour recoverability, so in these cases some sort of automated failover capability would be required. This calls into question the requirement for `Continuous or High Availability' systems and processes. See discussion above for further thoughts on Continuous Availability.
Also relevant to this segment, non-automated time-critical processes should have the same level of attention as do automated processes. |
|
As a practical matter, RTOs will differ. RTOs for each of the firms within critical markets or for firms that play significant roles in critical markets will differ according to their current market circumstance, automated configurations, outsourcers, telecommunications vendors/suppliers, etc. There are numerous factors that would effect the RTO of each and every firm. |
|
Yes. The out-of-region expectations as presented in the White Paper appear appropriate. The challenge with mandating this type of requirement is that there will always be exceptional circumstances where the expectations are simply not appropriate, and when this occurs it will unfortunately detract from other important components of the Agencies Guideline. It would seem better to present broad based out-of-region guidelines within which the firms have the flexibility to select the most appropriate backup resources. |
|
By virtue of the fact that the Agency Guideline is intended to address regional disruptions it would seem logical that a minimum distance be set forth. The challenge is to make such a minimum distance requirement relevant for firms that are located in different geographic locations (e.g., New York City, Atlanta, Los Angeles, Anchorage, Honolulu, etc.). This is where setting a fixed minimum distance will most likely become contentious. Our experience is that there is really no real fixed minimum distance, and that each firm must decide based upon several factors, including location of firm occupied sites, location of hotsite vendors, distance that employees could comfortably or practically commute, etc. Our opinion is that a minimum distance cannot be effectively mandated. Guidelines for setting minimum distances so that affected firms could make informed decisions would be helpful, however. |
|
The components of the decision needed to make such a decision include: location of firm occupied sites, location of hotsite vendors, distance that employees could comfortably or practically commute, costs of maintaining and operating backup facilities, operational efficiencies or inefficiencies of offsite backup facilities, hardware/ software/telecommunications resource requirements to name just a few. |
|
The Agencies should suggest what appropriate or acceptable practices should be, however, it would be very difficult to mandate and then enforce hard requirements in our opinion. |
|
Certainly. Even when considering the 9/11 event, operations that were affected in Manhattan, could have been recovered in New Jersey, Boston, Philadelphia, or other relatively nearby localities. In the South and East the primary concern has always been Hurricanes or Tropical Storms. In the West the primary regional concern is seismic in nature. Given these realities, many companies have prepared for out of region recovery, so this is nothing new. The challenge has been that no event of the magnitude or tragedy of 9/11 has really ever stressed the system. The point is that out-of-region or close-to-out-of-region alternative arrangements will continue to be viable under most sets of circumstances. Even with the 9/11 disruption, the hotsite vendors were all successful in helping those firms that needed assistance. On the other hand, should terrorists succeed in detonating a region affecting nuclear or bio-chemical weapon(s), then close-to-out-of-region alternatives may well prove ineffective. |
|
Without getting specific, and from a continuity planning perspective, it is always preferable to try and recover to as close to an identical configuration as possible. The more that the recovery capability looks like the primary or original operation, the more smoothly the recovery will likely be. Therefore, it is very desirable for standardization of communications protocols and other mechanisms that would make recovery as transparent as possible. Any encouragement the Agencies could provide manufacturers and/or industry groups in this area would help tremendously. |
|
Yes. It is our opinion that the timeframe for implementation should be coordinated among the firms and carried out quickly, given an appropriate upfront preparation time. Our experience is that once a medium to large organization (Fortune 500) decides to implement continuity planning for selected time-critical operations, that the BIA and Current State Assessment activities take between three to four months. Recovery alternative decisions and plans can be written within just a few weeks (up to two to three months) with initial walk-through testing of these preparations beginning immediately following plan and continuity planning process deployment. Giving companies more time than this usually results in them slowing the implementation to fit the timeframe and often leads to failure as other company priorities often pop up to take away attention from the usually unexciting continuity planning efforts. Our opinion is that the Agencies give all affected firms 6 months notification to begin, and another twelve months to come into full compliance. Additional time will usually not make efforts more efficient, and may indeed distract many from the immediacy of the effort. |
|
Yes. See comment above. We believe that the less time allowed the better. From notification by the Agencies to compliance (demonstrated by `meaningful testing') by the firms should be no longer than eighteen months, unless very special circumstances call for additional time. We can also observe that there is `never' a good time to perform continuity planning. Why? Because there is always a systems conversion coming, or a reorganization, or a personnel change, or some other event off into the future where it would seem logical to postpone continuity planning until it is completed. The fact is, that time really never seems to come, so postponement pending some future event should not be an option unless that event is exceptional or spectacular. |
|
Yes. See our comments above. We would suggest no more than eighteen months from notification of intent to initiation of meaningful testing. The term `meaningful testing' should be clearly defined. Exceptions should be made only in extreme cases and only for those firms who can prove that the additional time is really needed. The exception process should be rigorous enough so as to discourage application for waiver of the 18-month timeframe for frivolous reasons. |
|
No, and again we recommend the eighteen month window suggested above, with the rigorous exception process for unusual circumstances. |
Specific Comments from White Paper High-Level Outline (Other) | |
Summary Data | Comments |
|
SUGGESTION: We suggest an alteration in the wording of this Objective to `time-critical' versus the traditionally utilized term `critical.'
EXPLANATION: The concept of `time-critical' business processes and the resources that support them, including IT infrastructure, communications networks (both voice and data), facilities, external partners (trading, vendor, customer, outsourcers, public, etc.) should be emphasized. Defining and using the term `time-critical' is very useful to those who are attempting to visualize which parts of the enterprise should receive continuity planning attention and in what order. The term `time-critical' can easily be differentiated from `mission critical' or simply `critical' functions. To illustrate, it can be said that all time-critical processes are mission critical, but not all mission-critical processes are time-critical. Our experience is that between one-third and one-half of the business processes of an enterprise are truly time-critical. Narrowing the focus to time-critical processes and support resources streamlines the continuity planning process making it more efficient to develop, test, maintain, and measure in the long run. Focusing attention on `time-critical' versus simply `critical' processes can spell the difference in the long term success of the continuity planning processes. |
|
(Note: Same comment as above pertaining to terminology - critical versus time-critical) |
|
(Note: Same comment as above pertaining to terminology - critical versus time-critical) |
The agencies view these sound practices as being most applicable to organizations that present a type of systemic risk should they be unable to recover or resume critical activities that support critical markets. In this context, "systemic risk" includes the risk that the failure of one participant in a transfer system or financial market to meet its required obligations will cause other participants to be unable to meet their obligations when due, causing significant liquidity or credit problems and threatening the stability of financial markets.
The organizations that could present such systemic risk should they be unable to recover (i.e., complete) and resume (i.e., carry on) critical activities consist of core clearing and settlement organizations. |
Considered appropriate, no further comment. |
Other firms that play a significant role in critical financial markets also could contribute to systemic risk in payment and settlement systems should they be unable to recover critical activities. These organizations and key terms are described more fully below. | Considered appropriate, no further comment. |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
"Firms that play significant roles in critical financial markets" are those that participate in sufficient volume or value such that their failure to perform critical activities by the end of the business day could present systemic risk. The agencies believe that many if not most of the 15 - 20 major banks and the 5-10 major securities firms, and possibly others, play at least one significant role in at least one critical market. In the context of these sound practices, the agencies are considering the benefit of providing additional guidance (e.g., in terms of market-share or dollar-value thresholds) to help firms identify the category into which they fall for the specific activities they perform. | Considered appropriate |
For purposes of these sound practices, a "wide scale, regional disruption" is one that causes a severe disruption of transportation, telecommunications, power, or other critical infrastructure components across a metropolitan or other geographic area and its adjacent communities that are economically integrated with it; or that results in a wide scale evacuation or inaccessibility of the population within normal commuting range of the disruption's origin. | Considered appropriate |
A. Resilience of Critical Markets and Activities in U.S. Financial System Critical Markets. | |
The resilience of the U.S. financial system in the event of a wide-scale, regional disruption rests on the rapid recovery and resumption of critical financial markets defined above and the activities that support them. | Considered appropriate |
The rapid restoration of critical financial markets, and the avoidance of potential systemic risk, requires firms that play significant roles in those markets to recover business processes and functions sufficient to complete critical activities by the end of each business day. | Considered appropriate |
These critical activities are:
a) Completing pending large-value payment instructions; b) Clearing and settling material pending transactions; c) Meeting material end-of-day funding and collateral obligations necessary to assure the performance of items a) and b) above; d) Managing material open firm and customer risk positions, as appropriate and necessary to assure the performance of items a) through c) above; e) Communicating firm and customer positions necessary to assure the performance of items a) through d) above, reconciling the day's records, and safeguarding firm and customer assets; and f) Performing all support and related functions that are integral to the above critical activities. |
Considered appropriate |
The rapid resumption of critical financial markets requires that core clearing and settlement organizations are able to recover and resume within the business day the critical activities they perform that support the recovery of critical markets. | Considered appropriate |
B.
a) Processing new large-value payment instructions; b) Clearing and settling material new transactions; c) Managing material ongoing funding and collateral requirements necessary to assure the performance of items a) and b) above; d) Managing material ongoing firm and customer risk positions, as appropriate and necessary to assure the performance of items a) through c) above; e) Communicating changes in firm and customer positions necessary to assure the performance of items a) through d) above, reconciling the day's records, and safeguarding firm and customer assets; and f) Performing all support and related functions that are integral to the above critical activities. |
Considered appropriate |
Sound Practices to Strengthen U.S. Financial System Resilience The agencies have identified the following sound practices for core clearing and settlement organizations and other firms that play significant roles in critical financial markets. | Considered appropriate |
The sound practices address the risks of a wide-scale, regional disruption and strengthen the resilience of the financial system. | Considered appropriate |
They also reduce the potential for a regional disruption to have an undue impact on one or more critical markets because primary and back-up processing facilities and staffs are concentrated in a particular geographic region. | Considered appropriate |
Core clearing and settlement organizations and other firms that play significant roles in critical financial markets should identify all the critical activities they perform in support of critical markets. | Considered appropriate |
2. Determine the appropriate recovery and resumption objectives. | Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
3. Maintain sufficient out-of-region resources to meet recovery and resumption objectives. | Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
4. Routinely use or test recovery and resumption arrangements. | |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Substitute the term recovery-time targets' with recovery time objectives (RTOs) |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
|
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
|
Considered appropriate |
QinetiQ Trusted Information Management has 250 highly qualified professionals with an average of over 10 years of experience per person who bring to you the most important reference of all - a record of achievement and success all over the world.
QinetiQ Trusted Information Management understands the context for information security in your business and delivers assurance of vendor independence, global resources and extensive security expertise. We can leverage our trust relationship with you in your business by enabling you to meet standards and demonstrate best practice in protecting your information. We continue to work on that trust relationship by helping you manage the resulting security infrastructure and response capability.
QinetiQ TIM services are designed to give our clients innovative and sustainable solutions that meet their ongoing information security needs. Our exciting mix of Managed Security Services, Professional Consulting and Forensics expertise, plus an Education Program underpinned by world leading research teams will help clients achieve a durable and proven competitive advantage, now and in the future.
Our Professional Consulting Services range from Risk Assessment, Security Architecture, and Policy Development through to world-leading Penetration Testing and Vulnerability Analysis. Our Secure Operations Centers in both the UK and the US are protected to unparalleled standards that are recognized as military-strength, and our Research Facility is one of largest and most successful in the world. With a long pedigree of managing information security at the highest level and providing leading edge research into the future, QinetiQ Trusted Information Management provides a breadth and depth of Information Security Services unique in the industry.
CEO, John Holland
John Holland heads up QinetiQ plc's Trusted Information Management business and has additional responsibility for Finance market contracts. Based at the UK Headquarters, he has over 25 years experience working in the computer industry. Prior to joining QinetiQ he worked for Symantec where he was the Vice President of Worldwide Security Services, responsible for Symantec's Professional, Education and Managed Security Service business. John joined Symantec from AXENT where he had been the Vice President for Europe, Middle East and Africa, responsible for building the business in these areas.
President, Mike Corby, CISSP, CCP
Mike Corby has been a practicing IT professional for more than 30 years specializing in systems technology management and computer security. Mike joins QinetiQ from Netigy Corporation where he was Vice President of Global Security Practice. As a Technology Specialist, Systems Manager and CIO for large international corporations, and as the Consulting Director of hundreds of Systems and Technology projects for several diverse companies, he has put many theories and creative ideas into practice. He has worked as Practice Director for the IT consulting branch of Ernst & Young and CIO for a division of Ashland Oil and the Bain & Company Consulting Group. He is a certified Information Systems Security Professional (CISSP) and Certified Computer Professional (CCP). In 1994 he was awarded a lifetime achievement award by the Computer Security Institute.
Vice President - Continuity Planning/QAR, Carl Jackson, CISSP
Carl B. Jackson is a Certified Information Systems Security Professional (CISSP) and brings more than 25 years of experience in the areas of business continuity planning, information security, and information technology internal control reviews and audits. As the QinetiQ Trusted Information Management, Inc. Vice President-Continuity Planning, he is responsible for the continued development and oversight of QinetiQ-TIM (US) methodologies and tools in the enterprise-wide business continuity planning arena including network and eBusiness availability and recovery. Before joining QinetiQ-TIM, Mr. Jackson served as the continuity planning practice leader and Partner with Ernst & Young LLP. Mr. Jackson has extensive consulting experience with numerous major organizations in multiple industries, including: manufacturing, financial services, transportation, healthcare, technology, pharmaceutical, retail, aerospace, insurance, and professional sports management. He also has extensive business continuity planning experience as an information security practitioner, manager in the field of information security and business continuity planning, and as a university-level professor.
EMEA Director Global Service Development, David Lynas
David Lynas joined QinetiQ from Netigy Corporation, where he was Director of Global Security Practice. He is an internationally renowned Information Security professional with nearly 20 years experience in the industry. In recent years David has specialized in designing strategic security architecture and has led many successful engagements for companies in the finance, healthcare, telecommunications, chemicals, manufacturing, and technology sectors all over the world, and for government in Europe and the USA. David continues to be in high demand as a presenter having delivered sessions and keynotes on more than thirty different aspects of security to international conferences on four continents. He is the founder and chair of the prestigious annual COSAC conference.
EMEA Director Education, Christine Cambridge
Christine Cambridge has a software engineering background and has contributed to various information security research projects at QinetiQ for over 9 years. Specializing in requirement management, Christine later moved on to project manage a multi-million pound Information Security research program, acting as program conduit for international collaboration with other governments, research establishments and industry. She has contributed to the writing of military IT standards and participated in related Information Security steering groups. Over the past four years Christine has built one of the largest, most successful commercial Penetration Testing teams within the UK
EMEA Director Consulting, John Sherwood BSc MSc CEng FBCS CMC CISSP
John Sherwood is the Director of Professional Services (EMEA) within QinetiQ Trusted Information Management and is one of the key players transforming that company into a global world-class provider of Information Security Services. He has 31 years experience as an information-systems professional, the last 16 of which have been as a specialist in security of business information systems. The great majority of this security experience is in the banking and finance industry, but covers also aerospace, chemicals, oil & gas, telecommunications and government. Previous appointments include: Practice Director EMEA at Netigy (Feb 2001 - Sept 2001); Executive Director Architecture at Netigy (Jan 2000 - Feb 2001), Managing Director at Sherwood Associates Limited (Feb 1990 - Dec 1999); Managing Director at Computer Security Consultants Limited (Jan 1989 - Jan 1990); Systems Support Manager at Computer Security Limited (September 1985 - December 1988); Principal Lecturer, Software Engineering & Digital Communications Systems, De Montford University, Leicester (July 1983 - August 1985). John is also a visiting lecturer and external examiner at Royal Holloway College, University of London, and has published and lectured extensively around the world on a broad range of topics in the information security domain.
EMEA Director Forensics & Incident Response, Dave Bacon
David has 13 years experience working in the Metropolitan Police Force where he was involved in a number of investigations including the Lockerbie bombing, The Marchioness sinking and the Mardi Gras bomber. In 1994 was recruited to Computer Crime Unit at Scotland Yard and investigated computer systems and telephone networks for major offences of computer misuse (hacking) and telephone misuse (phreaking). Other areas of investigation include murder, kidnapping, rape and robbery as well as specialized fraud investigations including Airline Ticket, Travel Agency and Mortgage fraud. Recruited to DERA in 1998 and formed the Data Recovery & Computer Forensics Laboratory. Currently Director of Digital Investigations Services for QinetiQ, responsible for all QinetiQ Incident Response, Computer Forensic and Data Recovery services, as well as Secure Data Deletion and Special Projects offerings.
EMEA Director Research, Andy Bates
Andy Bates has over 20 years experience in research and development in the Internet and IT security area. In the early 1980s at the UK Royal Signals and Radar Establishment (RSRE), he was involved in the pioneering research to develop the Internet. This was followed by several years as an Internet consultant supporting many advanced technology projects. He became involved in information security in the late 1980s when he took on responsibility for the research and development of a state of the art multi-level secure distributed system. In recent years, within QinetiQ, he has been responsible for the strategic direction, management and growth of one of the largest and world class trusted information management research and development teams.
US Director of Technology, Peter Stephenson CPE, PCE
Peter Stephenson has lectured and delivered consulting engagements in eleven countries plus the United States on network planning, implementation, technology and security, and has written, co-authored or contributed to 14 books and several hundred articles in major trade publications. He began his professional career in 1965. Prior to joining QinetiQ Trusted Information Management, Inc. as U.S. Director of Technology, he was the Director of Technology for the global security practice of Netigy Corporation He operated his own information security consulting practice for over 15 years. He is the developer of the Intrusion Management model, the VAST method for vulnerability assessment, the S-TRAIS standards-based security requirements engineering method, and the End-to-End Digital Forensic Analysis technique for conducting digital investigations over large networks. Mr. Stephenson currently is a PhD candidate at Oxford-Brookes University where his research involves intrusion detection in a forensic environment. He holds the professional designations Certified Professional Engineer (CPE) and Professional Computer Engineer (PCE) from the International Society of Professional Engineers.
Vice President Solution Sales & Marketing, Americas, Keith Franz
Mr. Franz joined QinetiQ Trusted Information Management in March 2002 as Vice President Solution Sales and Marketing. Mr. Franz has over 28 years of sales and marketing experience and is a frequent speaker on a variety of security topics. His experience includes the use of security tools such as firewalls, intrusion detection and monitoring; Internet, mid-tier and mainframe system security; and security practices. Prior to joining QinetiQ Trusted Information Management, he served as Vice President - Sales for RedSiren Corporation. Mr. Franz also has held executive sales positions at such companies as Axent Technologies, Tartan, Inc. and Ansoft Corporation and sales positions with IBM and Wang Laboratories.
Vice President & Contracts Officer, Marie Fogarty
As Vice President & Contracts Officer for QinetiQ Trusted Information Systems, Inc., Ms. Fogarty brings over 15 years of legal expertise to her role, with a concentrated focus on representing professional services companies in the high tech marketplace. Ms. Fogarty was the Assistant General Counsel at Netigy Corporation, responsible for the direct legal support of the Eastern Region, Global Security Consulting Practice Group and the Channel/Alliances Organization. While at Netigy, Ms. Fogarty provided a full range of legal support and business counseling on all information technology contracts in the commercial and government context, including customized professional services engagements, systems integration, outsourcing, and packaged network security offerings. Prior to joining Netigy, Ms. Fogarty held senior level legal positions at several major IT vendors including Electronic Data Systems, Sun Microsystems and MCI Systemhouse. At MCI Systemhouse Ms. Fogarty was US Corporate Counsel in charge of a team of legal professionals and responsible for over $1 billion in business on an annual basis. Prior to entering in-house practice, Ms. Fogarty was associated with Sherman & Sterling and Cadwalader, Wickersham & Taft in New York, where her practice focused on the representation of financial institutions and mergers & acquisition transactions involving Fortune 500 companies.
Ms. Fogarty received her law degree with honors in 1987 from Cornell University Law School in Ithaca, New York, and graduated magna cum laude in 1984 with a degree in English and Sociology from the State University of New York at Binghamton.
The initial version of this chapter was written for the 1999 Edition of the "Handbook of Information Security Management." Since then, eCommerce has seized the spotlight, and Web-based technologies are the emerging solution for almost everything! The constant throughout these occurrences is that no matter what the climate, fundamental business processes have changed little. And, as always, the focus of any business impact assessment is to assess the time-critical priority of these business processes. With these more recent realities in mind, this chapter has been updated and is now offered for your consideration.
The failure of organizations to accurately measure the contributions of the Continuity Planning (CP) process to their overall success has led to a downward spiraling cycle of the total business continuity program. The recurring downward spin or decomposition includes planning, testing, maintenance, decline->>-re-planning, testing, maintenance, decline->>-re-planning, testing, maintenance, decline, etc.
In the past, Contingency Planning & Management (CPM)/Ernst & Young Continuity Planning Benchmark surveys have repeatedly confirmed that CP is ranked as being either extremely important or very important to executive management. The most recent 2000-2001 CPM/KPMG Continuity Planning Survey1 clearly supports this observation. This study indicates that a growing number of CP professional positions are migrating from the IT infrastructure to corporate or general management positions; however, CP reporting within the IT organization is still the norm. Approximately 40 percent of CP professionals currently report to IT, while around 30 percent report to corporate positions.
Continuity Planning Measurements
While the trends of this survey are encouraging, there is a continuing indication of a disconnect between executive management's perceptions of CP objectives and the manner in which they measure its value. Traditionally, CP effectiveness was measured in terms of a pass/fail grade on a mainframe recovery test or on the perceived benefits of backup/recovery sites and redundant telecommunications weighed against the expense for these capabilities. The trouble with these types of metrics is that they only measure CP direct costs and/or indirect perceptions as to whether a test was effectively executed. These metrics do not indicate whether a test validates the appropriate infrastructure elements or even whether it is thorough enough to test a component until it fails, thereby extending the reach and usefulness of the test scenario.
So, one might inquire as to what are the correct measures to use? While financial measurements do constitute one measure of the CP process, others measure the CPs contribution to the organization in terms of quality and effectiveness, which are not strictly weighed in monetary terms. The contributions that well-run CP Process can make to an organization include:
(1) Sustaining growth and innovation;
(2) Enhancing customer satisfaction;
(3) Providing people needs;
(4) Improving overall mission critical process quality; and
(5) Providing for practical financial metrics.
Just prior to the millennium, experts in organizational management efficiency began introducing performance process improvement disciplines. These process improvement disciplines have been slowly adopted across many industries and companies for improvement of general manufacturing and administrative business processes. The basis of these and other improvement efforts was the concept that an organization's processes (Process-see Definitions in Table 1) constituted the organization's fundamental lifeblood and, if made more effective and efficient, could dramatically decrease errors and increase organizational productivity.
An organization's processes are a series of successive activities, and when they are executed in the aggregate, they constitute the foundation of the organization's mission. These processes are intertwined throughout the organization's infrastructure (individual business units, divisions, plants, etc.) and are tied to the organization's supporting structures (data processing, communications networks, physical facilities, people, etc.).
A key concept of the Process Improvement and Reengineering movement revolves around identification of process enablers and barriers (see Definitions in Table 1). These enablers and barriers take many forms (people, technology, facilities, etc.) and must be understood and taken into consideration when introducing radical change into the organization.
The preceding narration provides the backdrop for the idea of focusing on continuity planning not as a project, but as a continuous process, that must be designed to support the other mission-critical processes of the organization. Therefore, the idea was born of adopting a continuous process approach to CP, along with understanding and addressing the people, technology, facility, etc. enablers and barriers. This constitutes a significant or even radical change in thinking from the manner in which we have traditionally viewed and executed recovery planning.
Radical Changes Mandated
High awareness of management and low CP execution effectiveness, coupled with the lack of consistent and meaningful CP measurements call for radical changes in the manner in which we execute recovery planning responsibilities. The techniques used to develop mainframe oriented disaster recovery (DR) plans of the 1980s and 1990s consisted of five to seven distinct stages, depending upon whose methodology you were using, that required the recovery planner to:
(1) Establish a project team and a supporting infrastructure to develop the plans;
(2) Conduct a threat or risk management review to identify likely threat scenarios to be addressed in the recovery plans;
(3) Conduct a business impact analysis (BIA) to identify and prioritize time-critical business applications/networks and determine maximum-tolerable-downtimes;
(4) Select an appropriate recovery alternative that effectively addressed the recovery priorities and time-frames mandated by the BIA;
(5) Document and implement the recovery plans; and
(6) Establish and adopt an ongoing testing and maintenance strategy.
Shortcomings of the Traditional Disaster Recovery Planning Approach
The old approach worked well when disaster recovery of glass house mainframe infrastructures was the norm. It even worked fairly well when it came to integrating the evolving distributed/client-server systems into the overall recovery planning infrastructure. However, when organizations became concerned with business unit recovery planning, the traditional DR methodology was ineffective in designing and implementing business unit/function recovery plans. Of primary concern when attempting to implement enterprise-wide recovery plans was the issue of functional interdependencies. Recovery planners became obsessed with identification of interdependencies between business units and functions and the interdependencies between business units and the technological services supporting time-critical functions within these business units.
Losing Track of the Interdependencies
The ability to keep track of departmental interdependencies for CP purposes was extremely difficult and most methods for accomplishing this were ineffective. Numerous circumstances made consistent tracking of interdependencies difficult to achieve. Circumstances affecting interdependencies revolve around rapid rates of change that most modern organizations are going through. These include reorganization/restructuring, personnel relocation, changes in the competitive environment, and outsourcing. Every time an organizational structure changes, the CPs must change and the interdependencies must be reassessed, and the more rapid the change, the more daunting the CP reshuffling. Because many functional interdependencies could not be tracked, CP integrity was lost and the overall functionality of the CP was impaired. There seemed to be no easy answers to this dilemma.
Interdependencies Are Business Processes
Why are interdependencies of concern and what, typically, are the interdependencies? The answer is that, to a large degree, these interdependencies are the business processes of the organization and they are of concern because they must function in order to fulfill the organization's mission. Approaching recovery planning challenges with a business process viewpoint can, to a large extent, mitigate the problems associated with losing interdependencies, and also ensure that the focus of recovery planning efforts is on the most crucial components of the organization. Understanding how the organization's time-critical business processes are structured will assist the recovery planner in mapping the processes back to the business units/departments, supporting technological systems, networks, facilities, vital records, people, etc., and also will facilitate keeping track of the processes during reorganizations and/or during times of change.
Traditional approaches to mainframe-focused disaster recovery planning emphasized the need to recover the organization's technological and communications platforms. Today, many companies have shifted away from technology recovery and toward continuity of prioritized business processes and the development of specific business process recovery plans. Many large corporations use the process reengineering/improvement disciplines to increase overall organizational productivity. CP itself should also be viewed as such a process. The following figure provides a graphical representation of how the enterprise-wide CP Process framework (Figure 1) should look:
Figure 1
This approach to Continuity Planning approach consolidates three traditional continuity-planning disciplines, as follows:
Route Map Profile and High-Level CP Process Approach
A practical, high-level approach to CP Process Improvement is demonstrated by breaking down the CP process into individual sub-process components as shown in the following figure (Figure 2):
Figure 2
The six major processes of the Continuity Planning business process are described below:
Figure 3 - Current State/Future State Visioning Overview
The Current State Assessment process also involves identifying and/or determining how the organization `values' the CP process and measures its success (often overlooked and often leads to the failure of the CP process). Also during this process, an organization's business processes are examined to determine the impact of loss or interruption of service on the overall business through performance of a business impact assessment (BIA). The goal of the BIA is to prioritize business processes and assign the recovery time objective (RTO) for their recovery as well as for the recovery of their support resources. An important outcome of this activity is the mapping of time-critical processes to their support resources (i.e., IT applications, networks, facilities, communities of interest, etc.).
The CP Value Journey is a helpful mechanism for co-development of CP expectations by the organization's top management group and those responsible for recovery planning. In order to achieve a successful and measurable recovery planning process, the following checkpoints along the CP Value Journey should be considered and agreed upon. The checkpoints include:
The Value Journey Facilitates Meaningful Dialogue
This Value Journey technique for raising the awareness level of management helps to both facilitate meaningful discussions about the CP Process and to ensure that the resulting CP strategies truly add value. As will be discussed later, this value-added concept will also provide additional metrics by which the success of the overall CP process can be measured.
In addition to the approaches of CP Process Improvement, and the CP Value Journey mentioned above, the need to introduce people-oriented Organizational Change Management (OCM) concepts is an important component in implementing a successful CP process.
Mr. H. James Harrington, et al, in their book Business Process Improvement Workbook2, point out that applying process improvement approaches can often cause trouble unless the organization manages the change process. They state that, "Approaches like reengineering only succeed if we challenge and change our paradigms and our organization's culture. It is a fallacy to think that you can change the processes without changing the behavior patterns or the people who are responsible for operating these processes."3
Organizational change management concepts, including the identification of people enablers and barriers and the design of appropriate implementation plans which change behavior patterns, play an important role in shifting the CP project approach to one of CP Process Improvement. The authors also point out that, "There are a number of tools and techniques that are effective in managing the change process, such as pain management, change mapping, and synergy. The important thing is that every BPI (Business Process Improvement) program must have a very comprehensive change management plan built into it, and this plan must be effectively implemented."4
Therefore, it is incumbent on the recovery planner to ensure that, as the concept of the CP Process evolves within the organization, appropriate OCM techniques are considered and included as an integral component of the overall deployment effort.
A complement to the CP Process Improvement approach is the establishment of meaningful measures or metrics that the organization can use to weigh the success of the overall CP process. Traditional measures include:
Instead, the focus should be on measuring the CP Process contribution to achieving the overall goals of the organization. This focus helps us to:
The CP Balanced Scorecard includes a definition of the:
Figures 4 and 5 illustrate the Balanced Scorecard concept and show examples of the types of metrics that can be developed to measure the success of the implemented CP Process. Included in this Balanced Scorecard approach are the new metrics upon which the CP Process will be measured.
Following this Balanced Scorecard approach, the organization should define what the Future State of the CP Process should look like (see the preceding CP Value Journey discussion). This Future State definition should be co-developed by the organization's top management and those responsible for development of the CP Process infrastructure. Figure 3 illustrates the Current State/Future State Visioning Overview, a technique that can also be used for developing expectations for the Balanced Scorecard. Once the Future State is defined, the CP Process development group can outline the CP Process implementation critical success factors in the areas of:
These measures must be uniquely developed based upon the specific organization's culture and environment.
Figure 4 - Balanced Scorecard Concept
Figure 5 - Continuity Process Score Card Example
Evolving with the birth of the web and web-based businesses is the requirement for 24x7 uptime. Traditional recovery time objectives have disappeared for certain business processes and support resources that support the organizations' web-based infrastructure. Unfortunately, simply preparing web-based applications for sustained 24x7 uptime is not the only answer. There is no question that application availability issues must be addressed, but it is also important that reliability and availability of other web-based infrastructure components such as computer hardware, web-based networks, database file systems, web servers, file and print servers as well as preparing for the physical, environmental, and information security concerns relative to each of these (See RMR above) is also undertaken. The terminology for preparing the entirety of this infrastructure to remain available through major and minor disruptions is usually referred to as Continuous or High Availability.
Continuous Availability (CA) is not simply bought; it is planned for and implemented in phases. The key to a reliable and available web-based infrastructure is to ensure that each of the components of the infrastructure have a high-degree of resiliency and robustness. To substantiate this statement, Gartner Research reports "Replication of databases, hardware servers, web servers, application servers and integration brokers/suites helps increase availability of the application services. The best results, however, are achieved when, in addition to the reliance on the system's infrastructure, the design of the application itself incorporates considerations for continuous availability. Users looking to achieve continuous availability for their web applications should not rely on any one tool but should include the availability considerations systematically at every step of their application projects."7
Implementing a Continuous Availability methodological approach is the key to an organized and methodical way to achieve 24x7 or near 24x7 availability. Begin this process by understanding business process needs and expectations the vulnerabilities and risks of the network infrastructure (e.g. Internet, Intranet, Extranet, etc.) including undertaking single-points-of-failure analysis. As part of considering implementation of Continuous Availability, the organization should examine the resiliency of the their network infrastructure and the components thereof including the capability of the their infrastructure management systems to handle network faults, network configuration and change, the ability to monitor network availability, and the ability of individual network components to handle capacity requirements. See Figure 6 for an example pictorial representation of this methodology:
Figure 6 - Continuous Availability Methodological Approach
The CA methodological approach is a systematic way to consider and move forward on achieving a web-based environment. A very high-level overview of this methodology is as follows:
Along these lines, in their book Blueprints for High Availability: Designing Resilient Distributed Systems8 Marcus and Stern recommend several fundamental rules for maximizing system availability (paraphrased):
The Marcus and Stern book8 is an excellent reference for preparing for and implementing highly available systems.
Reengineering the continuity planning process involves not only reinvigorating continuity planning processes but also ensuring the web-based enterprise needs and expectations are identified and met through implementation of continuous availability disciplines.
The failure of organizations to measure the success of their CP implementations has led to an endless cycle of plan development and decline. The primary reason for this is that a meaningful set of CP measurements has not been adopted to fit the organization's future state goals. Because these measurements are lacking, expectations of both top management and those responsible for CP often go unfulfilled. Statistics gathered in the Contingency Planning & Management/KPMG Continuity Planning Survey support this assertion. Based on this, a radical change in the manner in which organizations undertake CP implementation is necessary. This change should include adopting and utilizing the Business Process Improvement (BPI) approach for CP. This BPI approach has been implemented successfully at many Fortune 1000 companies over the past twenty years. Defining CP as a process, applying the concepts of the CP Value Journey, expanding CP measurements utilizing the CP Balanced Scorecard, and exercising the Organizational Change Management (OCM) concepts will facilitate a radically different approach to CP. Finally, since web-based business processes require 24x7 uptime, implementation of continuous availability disciplines are necessary to ensure that the CP process is as fully developed as it should be.
Table 1
Activities - Activities are things that go on within a process or sub-process. They are usually performed by units of one (one person or one department). An activity is usually documented in an instruction. The instruction should document the tasks that make up the activity.
Benchmarking - Benchmarking is a systematic way to identity, understand, and creatively evolve superior products, services, designs, equipment, processes, and practices to improve the organization's real performance by studying how other organizations are performing the same or similar operations.
Business Process Improvement (BPI) - Business Process Improvement is a methodology that is designed to bring about self-function improvements in administrative and support processes using approaches such as FAST, process benchmarking, process redesign, and process reengineering.
Comparative Analysis - Comparative Analysis is the act of comparing a set of measurements to another set of measurements for similar items.
Enabler - An enabler is a technical or organizational facility/resource that make it possible to perform a task, activity, or process. Examples of technical enablers are personal computers, copying equipment, decentralized data processing, voice response, etc. Examples of organizational enablers are enhancement, self-management, communications, education, etc.
FAST - Fast Analysis Solution Technique is a breakthrough approach that focuses a group's attention on a single process for a one or two-day meeting to define how the group can improve the process over the next 90 days. Before the end of the meeting, management approves or rejects the proposed improvements.
Future State Solution - is a combination of corrective actions and changes that can be applied to the item (process) under study to increase its value to its stakeholders.
Information - Information is data that has been analyzed, shared, and understood.
Major Processes - A major process is a process that usually involves more than one function within the organization structure, and its operation has a significant impact on the way the organization functions. When a major process is too complex to be flowcharted at the activity level, it is often divided into sub-processes.
Organization - An organization is any group, company, corporation, division, department, plant, or sales office.
Process - A process is a logical, related, sequential (connected) set of activities that takes an input from a supplier, adds value to it, and produces an output to a customer.
Sub-process - A sub-process is a portion of a major process that accomplishes a specific objective in support of the major process.
System - A system is an assembly of components (hardware, software, procedures, human functions, and other resources) united by some form of regulated interaction to form an organized whole. It is a group of related processes that may or may not be connected.
Tasks - Tasks are individual elements and/or subsets of an activity. Normally, tasks related to how an item performs a specific assignment.
References:
Contingency Planning & Management, January/February 2001. (The survey was conducted in the U.S. in October 2000 and consisted of readers and respondents drawn from Contingency Planning & Management magazine's domestic subscription list. Industries represented by respondents include Financial Services; Manufacturing/Industrial, Telecommunications, Education, Utilities, Healthcare, Insurance, Retail/Wholesale, Petroleum/Chemical, Information/Data Processing, Media/Entertainment; and Computer Services/Systems.)
H. James Harrington, Erick K. C. Esseling, Harm Van Nimwegen, Business Process Improvement Workbook, McGraw-Hill, 1997.
Robert S. Kaplan, David P. Norton, Translating Strategy Into Action: The Balanced Scorecard, HBS Press, 1996.
Gartner Group RAS Services, COM-12-1325, 29 September 2000.
Evan Marcus and Hal Stern, Blueprints for High Availability: Designing Resilient Distributed Systems, John Wiley & Sons, 2000.