Mellon Financial Corporation

October 21, 2002

Jennifer J. Johnson
Secretary, Board of Governors of the Federal Reserve System
20th Street and Constitution Ave. NW
Washington, DC 20551
Docket No. R-1128

State of New York
Superintendent of Banks
2 Rector St.., 19th Floor
New York, NY 10006

Office of the Comptroller of the Currency
250 E. Street, SW
Public Information Room, Mail Stop 1-5
Washington, DC 20219

Jonathan G. Katz
Secretary, Securities and Exchange Commission
450 5th Street, NW
Washington, DC 20549-0609
File No S7-32-02

Re: Docket No R-1128

Dear Ladies and Gentlemen:

Mellon Financial Corporation, Pittsburgh, Pennsylvania ("Mellon"), appreciates the opportunity to comment on the Draft Interagency White Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System. We applaud the initiatives taken to meet with representatives from our industry, and the efforts made to develop sound practices to insure business continuity in the critical financial markets.

Since last year's tragic September 11th events, we have embarked on a comprehensive review of our contingency planning efforts. As a part of that effort, we have also played a very active role within our industry. We would like to offer our observations and comments on the business recovery implications, testing, staffing, and the potential financial implications to the approach suggested by the draft interagency white paper.

We have organized our response into specific sections to address your comments in terms of technology recovery, business recovery, and implementation timeline and investment considerations. We conclude with our response to your request for other comments

Executive Summary

In order to address our mutual goal of increasing the resiliency of the financial services sector, we have been actively engaged on a number of fronts to better understand the issues and implications of the events of September 11. We have participated in a number of industry forums, and through the Financial Services Roundtable and BITS, we have worked to improve the industry's ability to manage a crisis and gain a better understanding of options and alternatives to improving the resiliency of the financial services industry. We have participated with BITS in the development of their response to the draft white paper, and are generally in agreement with their positions and recommendations. Our response will highlight the recommendations and approaches that we believe will improve the ability of the financial services sector to respond to a crisis.

Technology Recovery

First, we believe that data integrity is a critical factor to business resumption and recovery. It is our opinion that if production database systems are not synchronized between a primary and alternate processing facility, recovery time will become unpredictable and additional file recovery steps up to and including restoration from tape backup could occur. Given the current storage and fiber optic networking technology limitations, we believe that a technology recovery requirement spanning a large distance may, in fact, reduce the resiliency of our systems.

In order to insure that the primary and back-up databases are consistent, a hardware or software solution must be used to maintain production data in a synchronous fashion. There are a number of factors that define the maximum separation between synchronous data centers. First, given the current state of technology, synchronous data transport is generally limited to a 100-fiber kilometer range. Second, fiber optic network availability and routing in a region will dictate the separation of data centers since fiber miles do not translate into highway or airline miles. Third, application design and performance characteristics are key contributors to the effective distance between data centers. Therefore, we believe that it is not practical to assign a specific data center distance objective for the industry.

When considering selective application recovery, we believe that the technology environment that supports core settlement and critical market activities can be characterized as high volume, complex and requiring a high degree of change as new enhancements and features are added to these systems. We do not believe that it is possible to maintain system, application and database configurations on a selective or prioritized recovery basis for complex applications. Therefore, we believe that all production applications be capable of recovery to insure that critical components are not missing at time of disaster.

We understand the critical staffing risks and believe that data centers that were previously operated as "dark" sites must now be staffed with resources capable of immediately reacting to a disaster scenario. This will avoid the transportation issues of September 11 and help support the rapid recovery of critical application systems. At the same time, we believe this desired objective would be difficult to attain.

We do not believe that a recovery objective of two to four hours is achievable for all high volume applications, but rather suggest that a minimum of an eight-hour recovery objective is a reasonable goal. We have significant concerns regarding recovery time objectives where the application is not supported by synchronous data replication mechanisms. Furthermore, we are concerned that a firm that does not prioritize the integrity of their data between data centers will not be able to effect recovery on a timely basis.

Regarding implementation timelines, it is not possible to comment specifically since the actual requirements are not yet finalized. Once specific objectives are defined, however, we would then be able to comment on the time required to develop a meaningful plan.

In summary of our technical findings and considerations, we believe that the following technology approach most closely adheres to the overall objectives of the draft white paper:

  • Establish a relationship between the primary and alternate data centers such that synchronous data replication of all production data is possible.

  • Establish a recovery capability so that all production applications can be recovered at time of disaster.

  • Staff both data centers to insure that at time of disaster operational resources are in place to immediately effect recovery efforts. In-region resources should be considered as a viable staffing option as long as regional transportation issues and other requirements are considered.

  • A recovery time objective of a minimum of eight hours should be established for all applications.

  • The technology implementation timeline cannot be established pending a better understanding of the technical requirements.

Business Recovery

When considering business recovery, implementation timelines and investments, we suggest that clarification of the identification of critical activities is required. Once defined, meaningful effort on staff, space, equipment and investment requirements can begin.

The implications of staff separation can be significant considering operational control requirements and potential inefficiencies related to organization, space and equipment. Streamlining efforts that organizations have invested in to centralize and reduce costs in operational areas will now be negated, and require additional investments to redistribute these activities.

Criteria on staff separation should be directed to specific markets, subject to the relative impact of the markets on the financial system. Recognition that risk varies by region should also be considered and criteria specific to distance and staff separation should be provided as guidelines or recommendations. The regulatory expectations need to be flexible with the understanding that firms should be required to demonstrate their true recovery capabilities or mitigating solutions if a regional solution is implemented. Alternate sites located 20-40 miles from the primary site should accommodate the vast majority of potential disruptions, and also permit greater scale economies in their development and maintenance.

We are in favor of increased and broader testing, although it is unclear how these tests would be coordinated across financial institutions. We recommend that BITS or another industry group develop the testing objectives, approach and methodology.

Given the continued movement towards globalization, we recommend that the regulatory agencies coordinate their activities and expectations with foreign regulatory bodies. Standard expectations should be set across the industry globally, so that firms are not forced to comply with differing sets of regulations.

Implementation Timeline

We do not believe that it is possible to reasonably estimate the implementation timelines given the current lack of clarity on the future directions on technology and business recovery. We recommend that as the requirements are solidified, a period of six to nine months should be established for firms to develop their approach and initial plans to meet the new requirements. Following this period, monitoring should be considered to insure that sufficient progress is occurring with each institution's plans.

Investment Requirements

Regarding the investment requirements, we would expect a wide-range of investment to be required on a company-by-company basis. Since additional clarity is required on both technology and business recovery objectives, we are not able to estimate the overall investment requirements. However, we believe that the investment requirements will be material and will not be offset by productivity increases.

Conclusion

In closing, we are helping to drive a Financial Services Symposium on Business Continuity and Data Center Strategies with our fellow institutions. This meeting will be held in New York on November 18, 2002.

We recommend that the regulators take a proactive role in driving a national command center that unites all of the critical players needed for successful recovery efforts. The events of September 11 clearly emphasized the overall coordination effort required within the financial services sector and our dependence on the telecommunications industry. Without the needed redundancy and resilience in the telecommunication industry, all of the financial services industries' initiatives to improve our recovery efforts are in jeopardy.

We commend your efforts to help improve the resiliency of the industry and thank you for considering the views of Mellon Financial Corporation on these important issues. We appreciate your willingness to listen to what we have learned from our experience in researching this topic. If you have any further questions or comments on these matters, please do not hesitate to contact Susan Vismor, Senior Vice President, Corporate Crisis Management Coordinator (412-236-2196).

Sincerely,

Allan P. Woods
Vice-Chairman and CIO
Mellon Financial Corporation

cc: Martin G. McGuinn, Chairman and Chief Executive Officer


Table of Contents

Section 1: Technical Recovery Considerations

Section 2: Business Recovery Considerations

Section 3: Implementation Timeline and Investment Requirements

Section 4: Request for Comments

Section 1: Technical Recovery Considerations

Given the events of September 11 and the interagency draft white paper, we have been actively researching alternatives and lessons learned. Since September 11, we have participated in lessons-learned interviews with members of the financial services and technology sectors, and participated in and led a number of government and regulatory forums.

We believe that the follow guiding principles are critical to insuring the resiliency of the financial sector's processing environment:

  1. All production data is synchronized on a real-time basis between two data centers which need to be within a 100 fiber kilometer distance of each other;

  2. All production applications are recovered in a disaster;

  3. Production processing occurs in both data centers on a day-to-day basis;

  4. A recovery time objective minimum of 8 hours is maintained for all applications; and

  5. Operational staff is located in separate facilities on a permanent basis (while recognizing this will likely be the most difficult objective to meet).

When considering each guiding principle against technology recovery objectives, we believe that these principles best support the objectives of the White Paper:

    (1) Synchronous data replication insures that data loss does not occur between multiple data centers. With the elimination of the potential of data loss, recovery time is reduced and customer data is protected. Our proximity study focused on the technical limitations of storage subsystems and fiber optic networking on synchronous data replication. The study concludes that the distance between the two centers, due to technical limitations cannot exceed the following:

      (i) Mainframe data center configuration: 40 fiber kilometers

      (ii) Synchronous data replication: 100 fiber kilometers

      (iii) Distributed processor configuration: 60 kilometers

        (See Table 1)

    (2) Recovery of all production applications insures that key applications, application components and files are not overlooked in recovery planning. In complex environments that undergo a high degree of change, we believe that selective recovery is an error-prone process. If a single application component or file was missing that could extend recovery time. It is our belief that selective recovery is not a viable option for any multi-line financial institution. Additionally, selective recovery would generally require a serial recovery process that would entail first recovering the most critical application, and then proceeding to begin the recovery of the next most critical application. The recovery time of each application would be elongated by the amount of time that it takes to recover each application. Further, most business processes consist of a number of complex functions that require a number of subsystems for processing. It is our belief that any major financial institution that operates in a high volume-processing environment will have similar considerations. We do not believe that selective recovery of more than one application could be completed in the recovery time objectives laid out by the white paper. Also, in a practical sense, it may not be possible to predict a firm's ability to recover since each and every change to a production environment can destroy the ability to recover an application.

    (3) To increase the confidence that a firm can recover at time of disaster, we believe that both data centers should process production workload and not be relegated to a primary and backup configuration status. In order to reduce the time to recover, confidence in the processors, storage, and network services that are required to facilitate a recovery is higher when the facility is used to support production applications. Additionally, since each data center is supporting unique production workloads, applications that are hosted at the surviving center would not be affected by a failure at the alternate facility.

    (4) Recovery time is critical to the resiliency of the financial services sector. A number of factors contribute to a firm's ability to recover as noted above. The actual time to recover would be a function of the disaster scenario and the placement of production work between the two active data centers at time of disaster. We believe that a two to four-hour recovery period is not practical given the technology limitations of data synchronization and the complexity of the systems in the industry. We believe that a minimum of eight hours from time of disaster would be a more achievable and realistic goal.

    (5) After September 11, the ability to operate a "lights out" data center has been challenged given the increased terrorist threat and possible issues with transportation during a disaster. We believe that the best scenario, if possible, would be that each center should be staffed such that each location can independently effect a recovery operation in the event of loss of a data center. However, this will be extremely difficult to achieve.

Table 1: Technology Distance Limitations
Technology Typical
Fiber Distance
Limitation
Primary Technology
Recovery Benefits / Drivers
Mainframe (IBM)    
Parallel Sysplex 40 km Current benefit purely cost and GDPS features not much benefit in current configuration
Geographically Disbursed

Parallel Sysplex (GDPS)

40 km Increases the fail-over automation capability for mainframe systems
Escon 3 km Current benefit purely expense based.
Ficon 20 km w/o repeaters
100 km w/repeaters (30% reduction in data rate)
Increases the fiber distance restrictions of Escon. Reduces the fiber and dark fiber networking costs when compared to Escon
Disk (EMC)    
SRDF - synchronous 100 km Provides for zero data loss
SRDF - semi -synchronous 200 km Can achieve greater distance than in synchronous mode. Performance degrades as distance increases resulting in the potential for data loss.
SRDF - adaptive 200 km Can achieve greater distance than in synchronous mode. Performance degrades as distance increases resulting in the potential for data loss.
Distributed Systems    
IBM RS6000    
HACMP 0 km Provides fail-over and DR capabilities for RS6000 platform at zero distance for minimum cost. Typically used within the same data center for processor failure recovery.
HAGEO IBM to provide RS6000 disaster recovery at a distance - no automated fail-over
HP    
HP MC/Service Guard 10 km Provides single campus cluster with automated fail-over
HP Metro Cluster ~43-50 km Provides single cluster with automated fail-over using EMC SRDF and at longer distance than HP Service Guard
HP Continental Custer 50 km Provides multiple cluster semi-automated fail-over. HP Service Guard remains in use for local fail-over
Sun    
Veritas (Sun Solaris) 60 km  
Geospan VCS (Sun) 60 km  
Windows 2000    
Microsoft MSCS 9 km Provides local or close proximity fail-over
Geospan MSCS (W2K/NT) 60 km Increases the fail-over automation capability and geographical distance for Microsoft NT 4.0 & Windows 2000 systems

Section 2: Business Recovery Considerations

In order to clearly define the organizational components that are considered by the white paper, the agencies should clarify the identification of critical activities by providing a range of representative examples for each activity.

The draft paper recommends that once defined, critical financial activities should have back-up arrangements with sufficient out-of-region staff, equipment and data to recover production activities within the recovery time objectives. While such a redundancy will help insure more robust recovery capabilities, it also introduces new risks in terms of operational controls and potential inefficiencies considering organization, space and equipment. Some systems may need to be modified to add capabilities such as monitoring and synchronization controls within the split operations. Streamlining efforts that organizations have invested in to centralize and reduce costs in operational areas will now be negated, and require additional investments to redistribute these activities.

Additional guidance is needed with respect to out of region staffing requirements. Criteria should be directed to specific markets, subject to the relative impact of the markets on the financial system. Recognition that risk varies by region should also be considered. We believe that criteria specific to distance and staff separation should be provided as guidelines or recommendations. The regulatory expectations need to be flexible with the understanding that firms demonstrate their true recovery capabilities or mitigating solutions if a regional solution is implemented.

We believe that alternate sites located 20-40 miles from the primary site should accommodate the vast majority of potential disruptions, and permit greater scale economies in their development and maintenance. Each site should insure independent provisions for telecommunications, power, and water.

When considering the separation of specific activities, firms will potentially be faced with staff displacement issues, recruiting issues, and short-term loss of critical skills. Expense will be incurred to secure new workspace, while reuse of vacated workspaces will have to be addressed. In critical functions that require specialized equipment (i.e. check processing), additional investment will be required. Capacity planning considerations will now apply at both business-processing centers. Sufficient capacity will be required for peak demand periods in each center with the likelihood that a greater than 100% growth in overall capacity requirements could occur.

We are in favor of increased and broader testing, although it is unclear how these tests would be coordinated across financial institutions. This can be accomplished through BITS or another industry group developing the testing objectives, approach and methodology.

Clarity is also required to better understand the objective of "minimizing immediate systemic effects". Firms that are not prepared for normal transaction volumes with a complete application recovery will impact the resiliency of the finance sector. As recovery time elongates, the need for additional processing capacity (staff and computers) is likely to increase. Under those conditions a company will probably not be capable of normal aggregate daily volumes. They may be required to operate outside normal hours, they may not be able to settle in T+N timeframes, and they may have some transactions that are not reconciled. We believe that closure on the technology recovery approach will provide the ability to better gauge the effective improvements with respect to systemic effects.

Given the continued movement towards globalization, we recommend that the regulatory agencies coordinate their activities and expectations with foreign regulatory bodies. Standard expectations should be set across the industry globally, so that firms are not forced to comply with differing sets of regulations.

Section 3: Implementation Timeline and Investment Requirements

We do not believe that it is possible to reasonably estimate the implementation timelines given the current lack of clarity on the future direction of technology and business recovery. We recommend that as the parameters of the underlying assumptions become more definitive, a meaningful period of time should be established for firms to develop their approach and initial plans to meet the new timeline.

Overall, implementation timing is expected to vary on a case-by-case basis depending on the current staffing, geographic and technical configuration of a firm. Additionally, the white paper should recognize that given the magnitude to the projects that are considered here, lengthy planning cycles are required to insure proper execution. It is also important to note that although prioritization will be a key factor, in many cases, effort in this area will compete for resources that are required to conduct normal business.

Regarding the investment requirements, we would expect a wide-range investment to be required on a company-by-company basis. Since additional clarity is required on both technology and business recovery objectives, we are not able to estimate the overall investment requirements. However, we believe that the investment requirements will be material and will not be offset by productivity increases.

Section 4: Request for Comments

In order to address our mutual goal of increasing the resiliency of the financial services sector, as we indicated above, we have been actively engaged on a number of fronts to better understand the issues and implications of the events of September 11. We have participated in a number of industry forums, and through the Financial Services Roundtable and BITS, we have worked to improve the industry's ability to manage a crisis and gain a better understanding of options and alternatives to improving the resiliency of the industry. We have participated with BITS in the development of their response to the draft white paper, and are generally in agreement with their positions and recommendations.

Have the agencies excluded any critical markets from the list?

    The Draft White Paper currently defines Critical Markets by example - i.e., Federal funds, foreign exchange, etc. markets - although in its discussion of these markets the Paper recognizes these markets functionally. It is recommended that the definition of Critical Markets be a functional one contemplated in the applicable discussion - e.g., markets which provide financial institutions to adjust/manage their key cash and securities positions and those of their customers in order to manage material liquidity, market and other risks to the organization. The specific markets cited are apt examples of the markets contemplated, but using specific examples to define the term will ultimately, we believe, prove too limiting as financial circumstances change and develop. Defining the term functionally will be consistent with the approach taken to define "core clearing and settlement organizations."

Is there a need to define the term material in this context? If so, what should be used?

    Benchmarks should be defined for any nebulous terms. "Material" should represent an appropriate percentage of average market volume.

Have the agencies provided sufficient guidance for firms to determine whether they play significant roles in critical financial markets.

    No. For institutions not specifically mentioned (e.g., the top 15-20 largest banks) there is confusion as to who is impacted by this white paper.

Should the regulators notify firms when they are covered by the definitions?

    Yes, we believe they should. Firms should have some way to know that they are definitely included, whether it is a confidential notification or a benchmark formula.

Should the agencies establish an average daily dollar volume as a benchmark for either or both of these categories?

    We think that would be an acceptable approach; however, due to the wide variety of organizations, guidelines would be preferred to narrow rules.

Should the benchmarks differ by category or activity?

    If there are benchmarks provided, they should be category specific and distinguish between market and activity.

What impact would these definitions have on firms that do not meet the definitions? Would they, in effect, be expected to meet these expectations or risk being at a competitive disadvantage?

    We think for institutions that are on the fringe of these definitions, they could be at a competitive disadvantage; conversely, not having to comply with the guidance would likely result in a lower cost structure.

Should sound practices take into consideration the geographic concentration of the back-up sites of firms that as a group could play a significant role in critical markets?

    Sound practices should consider the concentration and location of institutions in any one given vendor that could represent a risk due to the first come, first server nature of many of these businesses.

Can firms that play significant roles in critical markets have no effective substitutes that can assume their critical activities (similar to core clearing organizations, which by definition have no effective substitutes).

    We are not aware of any, though there are probably some cases of this.

Does the paper's definition of wide-scale, regional disruption provide sufficient guidance for planning for wide-scale, regional disruptions? Is there a need to provide some sense of duration of a wide-scale regional disruption?

    A disruption can be of many kinds, have differing geographic impacts, and have a longer or shorter-term impact. We do not believe that a sense of duration would be helpful, as this would be a fact that is known after the event, and not before.

Have the agencies identified the critical activities needed to recover and resume operations in critical markets?

    We believe that in general they have.

Is four hours a realistic and achievable recovery time objective for firms that play a significant role in critical markets?

    We believe that out of region back up sites cannot support synchronous processing. Without synchronous processing, transaction data will be lost. It is unclear how there can be a clean recovery with lost data. Accordingly, we think four hours recovery time is unrealistic.

Is two hours a realistic and achievable recovery time objective for core clearing and settlement organizations?

    We believe this is even more unrealistic, particularly because of the interconnectivity between systems for core clearing and settlement operations.

Should recovery and resumption-time objectives differ according to critical markets?

    From a business perspective a logical answer is yes; from a technology recovery perspective we would not make any such distinction.

What kind of testing is feasible/possible?

    Testing needs to be done end- to- end with all involved parties such as telecommunications companies, third party providers, and exchanges. Industry groups such as BITS and/or the agencies, should help drive the capability to do more complete testing with all parties.

Will the regulatory agencies sponsor tests, especially with other industries? If so, how will these tests be conducted?

    Effective testing needs to be done on an end-to-end basis. For example, the Federal Reserve, New York Stock Exchange, American Stock Exchange and other critical government agencies need to have published and tested recovery plans.

Have the agencies sufficiently described expectations regarding out-of-region back up resources?

    We believe additional guidance would be helpful. Such guidance, however, needs to recognize that one size cannot fit all the key players.

Should some minimum distance from primary sites be specified for back-up facilities for core clearing and settlement organizations and firms that play significant roles in critical markets?

    No. We believe that there are other factors that contribute to the risk, as well as the soundness of a recovery solution. We think the regulators should have a more holistic view of the solution, and not concentrate only on the proximity issue

What factors should be used to determine minimum distance?

    We do not think there should be a minimum distance. Also, based on our testing, we know that 200-300 miles separation is not realistic. Numerous factors need to be considered. What is eventually proposed needs to be presented as guidelines.

Should the agencies specify other requirements (e.g., back-up sites should not be dependent on the same labor pools or infrastructure components, including power grid, water supply and transportation systems)?

In light of the complexity of the needs, the agencies should only propose recommendations or guidelines.

Should these expectations apply to back-up sites in foreign countries?

    When issuing supervisory guidelines, the Fed/OCC/SEC should be coordinating any position with other regulatory bodies such as the UK's FSA. Regulatory coordination is required to insure that multinational financial institutions who may choose the UK as an alternate processing site, do not encounter inconsistent or contradictory requirements and are not impeded by the foreign regulators.

Are there alternative arrangements within a region that would provide sufficient resilience in a wide-scale regional disruption? What are they?

    Back-up sites within a region should accommodate the great majority of potential disruptions. Those sites should in turn have independent critical infrastructure, such as telecommunications, water and power.

Are there other arrangements that core clearing and settlement organizations should consider, such as common communication protocols, that would provide greater assurance that critical activities will be recovered?

    Difficult to determine absent a defined critical disruption

Is it feasible/realistic to require firms to complete plans no later than 180 days after the agencies issue their final views?

    Absent greater clarity about the nature of the disruption, it is hard to respond to this question. However, a six-month planning cycle is not realistic, especially because of the amount of change that a business may have to undergo to satisfy the guidelines.

Is it feasible/realistic to require all core clearing and settlement organizations to implement plans to establish out-of-region back-up resources by August 2003?

    No, unless the site build-outs are already in process.

What impact would such a requirement have on these firms and on the economy of regions with high concentrations?

    For cities like Pittsburgh, it could be detrimental to relocate a substantial amount of operations staff. If you multiply this by the other institutions in the city, it could have a very negative impact.

To insure that enhanced business continuity plans are sufficiently coordinated among participants, should specific implementation timeframes be considered?

    No, it seems that it would just add additional burdens and complications to the process.

Is it reasonable to expect firms that play significant roles in critical financial markets to achieve sound practices within the next few years?

    Yes. However, for some of the businesses, it may take several years to do this in a logical, phased approach (i.e., the migration of staff/job functions to other cities).

Should the agencies specify an outside date (e.g. 2007) for achieving sound practices to accommodate those firms that may require more time to adopt sound practices in a cost-effective manner?

    If the date is far enough out, such as 2007, it should be acceptable.

Would such a distant date communicate a sufficient sense of urgency for addressing the risk of a wide-scale regional disruption?

    Yes, it would allow businesses to address this issue in a well thought out approach, rather than in a purely reactive manner.

Are there any specific questions that BITS should raise with regard to the telecommunications infrastructure and the need for the telecom industry to address these vulnerabilities?

    BITS should demand that the financial services industry be guaranteed diversity by the telecommunications providers. The regulators should work with their counterparts in the telecommunication industry to insure that a diversity of telecommunications systems is in place.