A Vision for Data at the SEC
Commissioner Kara M. Stein
Keynote address to Big Data in Finance Conference
Oct. 28, 2016
Thank you, Michael [Barr], for that kind introduction. Thank you also to the University of Michigan and the Office of Financial Research for organizing this important conference. I am pleased to be here with you today. This conference is a vital opportunity to discuss some of the forces shaping financial markets and regulation.
“What hath God wrought.” That message, sent from Washington to Baltimore in 1844, signaled the arrival of the telegraph. Originally an exclamation, but sometimes written as a question, it captures both the wonder and uncertainty that new technologies can inspire. The telegraph proved to be a transformative technology. Information was no longer limited by the speed of a horse, but could spread almost instantaneously. Farmers in one part of the country could learn about prices in distant markets, and an expanding country would soon communicate from one coast to another.
The transformation of knowledge that began with the telegraph has continued for more than a century. Over the last several decades, in particular, the amount of data, and our capacity to store and process it, has grown at an astounding rate. These technologies have touched nearly every endeavor, including science, healthcare, and, of course, finance. Even the Bard has felt the effect – thanks to text analysis, Shakespeare will now have to share credit for several plays with Christopher Marlowe.
Railroads and telegraphs, then automobiles and telephones, fundamentally re-shaped society. By connecting distant locations with greater speed, they created new avenues for spreading goods and knowledge. The result was increased prosperity, convenience, and efficiency. The current revolution in data similarly allows us to make novel connections. However, instead of connecting physical locations, we’re connecting data points to uncover new insights. The result may be another leap forward in how knowledge develops and spreads.
But you know all this. Today, I want to drill down a little and talk about how these developments are affecting the financial markets and the SEC in particular. Specifically, I want to discuss four broad themes: the new opportunity data provides, why the SEC must keep up with data’s growing role in the markets, some of the challenges to keeping up, and some ideas for overcoming these challenges. I am going to focus on my perspective as a Commissioner at the SEC, but these themes likely have relevance to other financial regulators as well. Ultimately, regulators are being disrupted by new technology, and it is important to focus on what we should be doing about it.
However, before going further, let me say that my remarks today and the views I express are my own, and do not necessarily reflect the views of my fellow Commissioners or the staff of the SEC.
The new opportunity
Information – particularly fast and reliable information – has always been central to the financial markets. For example, shortly after the development of the telegraph, brokers began leasing their own telegraph lines so they could receive pricing information faster.
The securities laws, in large part, can be understood as a set of rules about information. What information is useful for public dissemination? How should we ensure that it is accurate and reliable? Does fairness require everyone to have the information at the same time? How do you protect nonpublic information? What information is necessary for price discovery? The federal securities laws speak to all of these questions.
In this sense, “big data” is a continuation of an old theme. In another sense, the developments in data over the last 10 to 15 years represent a wholly new phenomenon, in the same way that satellite imaging is completely different from surveying a landscape from the top of a hill. At that scale, patterns become evident that would have been impossible to piece together by considering one plot at a time.
This means that both market participants and regulators have new opportunities for developing knowledge. Moreover, this is a qualitatively different kind of knowledge, encompassing entire data sets in one pass rather than slowly accumulating insight from individual experiences.
Market participants have already seized this opportunity. They are using a wide range of data sources to cull signals about possible market movements. Among these data sources is information that the SEC makes available, the demand for which is enormous. In the last year, our website received over seven billion page views – that’s more than some major media sites. We also delivered more than two petabytes of data to visitors. I recall, not long ago, when a gigabyte seemed like a lot of data.
The SEC itself is also beginning to realize some of the potential of new data tools. In a recent case, we obtained a settlement against a large broker-dealer for its failure to adequately train its representatives, when they were selling certain complex debt instruments. What made the case unique was that, instead of traditional investigative techniques, it was built on custom analytics. SEC industry specialists, working with a team of tech experts, developed tools to sift millions of trading records. Using this technique, they were able to identify over 8,000 retail customers for whom the investment in the complex debt instruments was inappropriate. A case like this may not have been possible in the past.
I believe, however, that enforcement is not the most important potential use of data for the SEC. By the time we start building a case, the harm has been done. I am much more interested in establishing rules of the road that can help prevent crashes than in waiting to deal with the aftermath. Better investor protection is avoiding fraud and misconduct in the first place. The key question is how can we design policies and monitoring systems that support healthy market function and aid in compliance on the front end? Improved data tools have the potential to be uniquely powerful in this way. They can allow us to make better, more tailored policy choices that focus on actual risks to investors and the market. This approach will never replace the need for humans and human judgment, but it can improve the markets and help us make smart use of our limited resources.
The risk of being left behind
Data and technology present tremendous opportunities and benefits – but they have also opened the door to new and exceedingly complicated risks. Data is distributed across a range of electronic platforms, complicating the task of monitoring and examining market participants. Moreover, the variety of data has increased dramatically. This includes highly structured derivative transactions that are reported in competing taxonomies, such as FIX and FpML, as well as unstructured information, like social media posts and narrative reports. As a result, in addition to being an exciting opportunity, the spread of data and data tools also requires the SEC to make changes to keep up.
A number of recent developments highlight this need. Today’s electronic trading environment has significantly changed the capital markets. Algorithms called “matching engines” match electronic limit orders with electronic market orders. High-speed trading dominates, representing over 55% of US equity markets. And liquidity provision has largely shifted from traditional market-makers to computerized systems that trade in fractions of a second across different trading venues and securities.
When the flash crash happened on May 6, 2010, equities markets suddenly plunged and then rebounded. In contrast to the incredible speed of this disruption, it was months before the SEC and CFTC were able to gather the necessary information, churn through the data, and produce an analysis of trading on that one day. I was working in the Senate at the time, and I remember the uncertainty as we waited to understand whether the disruption signaled a vulnerability and whether it would happen again. Since then, the financial markets have experienced other temporary disruptions and mini-flash crashes.
These events implicate data both in their causes and in the ability of regulators to understand and respond. They also highlight the risk for regulators of driving a carriage in the age of Tesla – by the time you’re pulling out of the stable, everyone else’s autopilot will have passed you by.
Since the flash crash, the SEC has made some steps toward developing enhanced monitoring capabilities. The SEC has, for instance, created the “Market Information Data Analytics System,” or MIDAS. MIDAS combines information from the consolidated tape and separate proprietary feeds to create a more complete picture of equity market activity. We need to go further, though. The SEC soon will consider a plan to create the largest data repository of securities trading activities that has ever existed. This is widely known as the “consolidated audit trail,” or CAT. This unprecedented data effort will help us, finally, move at highway speeds.
The SEC’s statutory mission involves three core objectives – to protect investors, to maintain fair and orderly capital markets, and to facilitate capital formation. The SEC’s mission is not changing in the Digital Age, but our tools for carrying out that mission must. We simply cannot be effective or efficient without a strategic approach to our mission, financial markets, and data.
Challenges to keeping up
So, recognizing that the growth in data represents both an opportunity and a necessity, how can the SEC best position itself to respond? There are a number of challenges. I am not going to address these exhaustively, but I want to talk about several that are critical.
The first condition to success is acquiring the right data. We need data that is relevant, timely, and high quality. This is not about simply increasing the volume of data – this requires being smart about the data we gather. When you want an apple, chopping up the tree usually isn’t the most efficient way to get it. We must instead carefully consider our possible sources, data gaps, and reporting methods. Where we mandate reporting, how can we ensure that our requirements keep pace with a quickly evolving market? Can we design reporting requirements so that our resources are spent on analysis instead of cleaning data sets? These are challenging, but important, questions that require a forward-looking approach.
Another condition to success is ensuring that computers can quickly and reliably interpret the data. Structured data can be an important part of this. SEC reporting in the last few years has begun to embrace better practices for structured data. For instance, we recently adopted reporting requirements for mutual funds and ETFs that will require the use of XML. Our existing Form PF, on which hedge fund managers report, also uses XML. Where structured data is not available or reliable, we should continue to explore techniques that allow better parsing of unstructured data.
We should also embrace identifiers, like the “legal entity identifier,” or LEI. I remember vividly the uncertainty following Lehman’s collapse. Regulators and counterparties were left to sort through the rubble, trying to piece together the consequences. It was a problem of modern complexity but without an equivalently modern solution. LEI helps solve that problem by providing a uniform and reliable way to identify counterparties. This frees regulators to focus on the financial risks instead of data issues. The new mutual fund reporting forms also include an important new requirement for funds to obtain an LEI.
Growing use of XML and LEI are important steps. However, we can still do more to ensure that we have data that is ready for efficient analysis.
Another significant challenge is limited resources. Using data effectively depends both on having people with specialized skills and sophisticated systems. In addition to our excellent staff of lawyers and accountants, we need more professionals with the right technical skills. Our systems and software also need to keep up with the speed and volume of today’s markets. Our main information reporting system, EDGAR, was a huge leap forward 20 years ago, but it is ancient in tech years. With a limited budget and an ever-evolving market, we may never drive the latest model. We must, however, find a way to keep pace.
We also need to remain vigilant about protecting proprietary information. Cybersecurity is a constantly evolving risk – the latest large-scale attacks in the news, directed at certain internet utilities, may have been launched using webcams. As a result, we have to be creative and dynamic in developing responses.
We must also ask how new data tools can improve our ability to deliver decision-useful information to investors. I have spoken before on the need to create a Digital Disclosure Task Force to help us reimagine how the SEC acquires and provides data to investors and market participants. I believe that we have an opportunity to reduce the burden on companies while at the same time providing better disclosure to investors.
Another condition to success is that we never lose sight of the human element. No matter how powerful the processor, at the end of the day, humans develop the assumptions and metrics, design the analyses, program the algorithms, and interpret the results. The housing crisis forcefully reminded us that models are built from assumptions chosen by humans, and those can be fallible. Human biases are as much a part of digital databases and computer analysis as they are any other source of knowledge. Accordingly, we must approach them with the same judgment, caution, and safeguards.
The final challenge I want to touch on is one of leadership. Success here depends on drawing together an array of expertise and making decisions with limited resources and within limited budgets. This cannot be done without a vision. The vision needs to be accompanied by a detailed roadmap and effective management. Without these, we risk missing the opportunity that data can provide.
Meeting the challenges
How can the SEC meet the challenges of developing and using data effectively? In recent years, we have made some progress. An important part of this has been the growth of our Division of Economic Research and Analysis, or DERA, as well as several other groups that include analysts, quants, and economists (and even a physicist). These include talented folks in the Risk and Examinations Office in Investment Management, the Risk Analysis Examination Team in our exams office, and the Center for Risk and Quantitative Analytics in Enforcement. These groups have advanced our ability to handle data with rigor. They have also played an important role in everything from identifying risk to informing policy and conducting investigations. In addition, we have made some progress on the systems front with the development of MIDAS and, hopefully, in the near future, the advancement of CAT.
But more needs to be done, and it needs to be smart.
An important part of defining our vision should be the creation of an Office of Data Strategy. For some time, I have asked that the SEC develop an executive team responsible for creating and overseeing such an office. The office would be responsible for coordinating the creation of a data strategy addressing how we collect, manage, use, and provide data. This is a critical next step in turning our ad hoc growth as data users into a deliberate plan. The office could lend its expertise to, and would coordinate with, our policy, exam, and enforcement offices. Having a data strategy, and a team dedicated to it, is especially important in light of our limited resources.
Just as the telegraph ushered in a new information era, the spread of data and data tools is changing how information is used and shared. The promise is tremendous, and if the SEC can successfully harness the new technology, investor protection, financial stability, and the markets will all benefit.
Thank you again for inviting me to talk with you all. You are engaged in a fascinating and important conversation, and I look forward to hearing about your new ideas in this area.
 See Daniel Walker Howe, What Hath God Wrought: The Transformation of America, 1815-1848 (Oxford History of the United States) (2007).
 See Travis M. Andrews, Big Debate About Shakespeare Finally Settled by Big Data: Marlowe Gets His Due, The Washington Post (Oct. 25, 2016), available at https://www.washingtonpost.com/news/morning-mix/wp/2016/10/25/big-data-helps-put-centuries-old-shakespearean-debate-to-rest/.
 The first telegraph lines became operational in 1844. The first stock ticker followed in 1867. “In 1873, brokers began leasing private telegraph lines to obtain pricing data faster and execute trades earlier.” Jason Zweig, Wall Street, 1889: The Telegraph Ramps Up Trading Speed, The Wall Street Journal (Jul. 7, 2014), available at http://www.wsj.com/articles/wall-street-1889-the-telegraph-ramps-up-trading-speed-1404765917. See also Tom Standage, The Victorian Internet: The Remarkable Story of the Telegraph and the Nineteenth Century’s On-Line Pioneers (1998).
 See SEC Charges UBS with Supervisory Failures in Sale of Complex Products to Retail Investors (Sept. 28, 2016), available at https://www.sec.gov/news/pressrelease/2016-197.html. See also In the Matter of UBS Financial Services, SEC Release No. 78958 (Sept. 28, 2016), available at https://www.sec.gov/litigation/admin/2016/34-78958.pdf.
 Id. (“We can now analyze literally hundreds of millions of trading records using sophisticated coding techniques that allow us to build platform wide cases rather than cases built investor by investor.”).
 See Establishing the Form and Manner with which Security-Based Swap Data Repositories Must Make Security-Based Swap Data Available to the Commission, SEC Release No. 34-76624 (Dec. 11, 2015), available at https://www.sec.gov/rules/proposed/2015/34-76624.pdf.
 Austin Gerig, High-Frequency Trading Synchronizes Prices in Financial Markets, DERA Working Paper Series (Jan. 2015), available at http://www.sec.gov/dera/staff-papers/working-papers/dera-wp-hft-synchronizes.pdf.
 Austin Gerig & David Michayluk, Automated Liquidity Provision, DERA Working Paper Series (Dec. 2014), available at http://www.sec.gov/dera/staff-papers/working-papers/dera-wp-automated-liquidity-provision.pdf.
 See Findings Regarding the Market Events of May 6, 2010, Report of the Staffs of the CFTC and SEC to the Joint Advisory Committee on Emerging Regulatory Issues (Sept. 30, 2010), available at https://www.sec.gov/news/studies/2010/marketevents-report.pdf.
 See Joint Industry Plan; Notice of Filing of the National Market System Plan Governing the Consolidated Audit Trail by..., SEC Release No. 34-77724 (Apr. 27, 2016), available at https://www.sec.gov/rules/sro/nms/2016/34-77724.pdf.
 See Data & Standards, Legal Entity Identifier (LEI), Office of Financial Research, available at https://www.financialresearch.gov/data/legal-entity-identifier/.
 See Nate Lanxon, et al., Connected Gadgets Blamed as Internet Recovers From Friday Attack, Bloomberg (Oct. 22, 2016), available at https://www.bloomberg.com/news/articles/2016-10-22/connected-gadgets-blamed-as-internet-recovers-from-friday-attack.
 See, for example, Disclosure in the Digital Age: Time for a New Revolution (May 6, 2016), available at https://www.sec.gov/news/speech/speech-stein-05062016.html.
 See Nanette Byrnes, Why We Should Expect Algorithms to Be Biased, MIT Technology Review (Jun. 24, 2016), available at https://www.technologyreview.com/s/601775/why-we-should-expect-algorithms-to-be-biased/?set=601766; Claire Cain Miller, When Algorithms Discriminate, New York Times (Jul. 9, 2015), available at http://www.nytimes.com/2015/07/10/upshot/when-algorithms-discriminate.html?_r=0.
 See, for example, The Dominance of Data and the Need for New Tools: Remarks at the SIFMA Operations Conference (Apr. 14, 2015), available at https://www.sec.gov/news/speech/2015-spch041415kms.html.