Gregg E. Berman
Associate Director, Office of Analytics and Research
Division of Trading and Markets
U.S. Securities and Exchange Commission
SIFMA TECH Conference, New York
June 18, 2013
Good Morning. It is a pleasure to be here today and I am quite excited to have the opportunity speak with you about some truly transformational market technologies in use today at the SEC. In the past I’ve had opportunities to discuss market structure at SIFMA’s fall conference, but this is my first time at SIFMA’s tech conference.
However, this conference covers topics I am quite comfortable with, and passionate about – prior to joining the SEC I had spent over a decade helping to build a financial technology company that specialized in complex data and analytics solutions.
As such, I hope I have at least a good working knowledge of the technology topics and issues of the day.
And as always, my comments today are based on my own experiences and opinions and do not necessarily represent the views of my colleagues at the SEC or of the Commission itself.
Let me begin by setting the stage. The date was May 6, 2010. The time was 2:45 in the afternoon. And the S&P had just fallen 5% in only 5 minutes. Ten or so minutes later, it fully recovered. Thus was born what is now commonly known as the Flash Crash.
Our job was to team up with the staff at the CFTC, figure out what happened, and write up a report as soon as possible. If not for a few car chases, this task was worthy of an episode of Mission Impossible.
Given the interconnected nature of our markets, the first step in our analysis was to determine whether the Flash Crash was initially triggered by events in the cash equity markets or in the derivative futures markets. To perform such an analysis we unfortunately needed to fully reconstruct the order books for thousands of individual stocks –a process that involved building analytics to process billions of individual records and took us nearly 4 months.
Recall that at that time all bets were on the problem originating in the equity markets – the press, the pundits, and even the markets themselves believed the crash must have been directly caused by something in the cash equity markets. Speculation ranged from delays in market data; to problems arising from market fragmentation; to claims that one or more equity-based high-frequency traders suddenly went wild.
But what we were able to show by careful and painstaking analysis, was that contrary to initial perceptions, the problem actually originated in the futures market for S&P 500 “E-Mini” contracts and then quickly cascaded to the markets for individual equities.1
Our findings were initially unexpected, and even to this day there are those who remain skeptical of some of our conclusions. If you find yourself in that category I’d encourage you to download a recent and completely independent analysis of the Flash Crash performed by two researchers at the Duisenberg School of Finance in Amsterdam.
Their results are very consistent with our own findings, and indeed they determined that the Flash Crash originated in the futures market. More so, they were able to take their data analysis even further than we did in our report, and showed quite precisely how a very large sell order in the E-Mini futures market placed that afternoon interacted with the algorithms of other futures traders resulting in “disproportionately large pricing pressure” that drove down prices.2
Though we certainly learned a lot about the markets by studying the Flash Crash, we also learned how important it was to be able to accurately collect and analyze complete order book data, which is the lynchpin for these types of analyses.
This means processing and analyzing quotes and orders for every stock across ALL price points, not just select price points. It also means being able to track quotes and orders as they are modified, canceled, or hit, which results in a trade execution.
This is known as a “full depth-of-book” analysis.
We also recognized the shortcomings of our needing to take nearly 4 months to perform a full depth-of-book analysis for just one day of trading.
So it was based on these realizations that we launched one of the SEC’s most advanced technology projects related to market data analysis.
On the heels of publishing our final report in the fall of 2010 we issued a public Request for Information (an “RFI”) calling for information on market data and analytics systems from any and all vendors. We received a wide range of responses and learned about a variety of potential tools and technologies.
For example, we learned about solutions that provide detailed analytics but lacked methods for collecting and storing vast amounts of market data on which the analyses would be performed.
We also learned about solutions that could readily store big sets of data in a generic fashion but lacked methods of performing analyses on this data.
And of course we heard from those who said they did not know much about market data or analytics, but were sure they could build whatever we wanted if we gave them enough money.
But mostly, we learned what we needed to know to further hone our requirements. And as a result, we were able to secure a budget and issue a public Request for Proposal (an “RFP”) in the Fall of 2011.
Our requirements were specific. We wanted a system that would enable us to readily collect and analyze all trade and quote data from the public tapes for equities, from the public tape for options, AND the so-called proprietary feeds from each of the equity exchanges and applicable futures exchanges. More so, we wanted a commercial off-the-shelf product, not something that was going to be custom built for the SEC, or that would involve a consulting project, or significant technology build-out.
As expected we again received responses from a variety of vendors, and, as has been publicly reported, finalized the award of a contract in July of 2012. We then did something a bit risky: we set a very ambitious goal of being able to roll out the system across the SEC by the end of the year, which was only 6 months away. Even riskier, we reported this goal to the press.
Then, something amazing happened. Within three months we had a beta system up and running. Advance users were able to test the analytics with real data and we were even able to begin formal training sessions. By the end of the year we had a feature-complete system in operation. And in January of 2013 the production system was formally rolled out to the SEC.
So let me tell you a little about this truly transformational technology.
First, we call the system MIDAS. It’s an acronym for Market Information Data Analytics System. But like all good acronyms we picked MIDAS first and then figured out what the letters meant afterwards.
MIDAS is of course all about market data. As many of you may know, there are very specific rules and regulations regarding the collection and public dissemination of listed equity and options market data.
On the equity side, every trade of 100 shares or more, whether executed on or off a public exchange, is reported to one of three public feeds. Collectively, we call these feeds the consolidated tape.
In addition to data on trade executions, a parallel set of feeds also provides select data on quotes for stocks offered on each equity exchange.
A similar set of trade and quote data is also collected and disseminated for options data in what is known as the OPRA feed.
Today, nearly all market participants in one form or another use these feeds. Asset managers, hedge funds, mutual funds, retail traders, and investors typically access this data either directly, or through another system such as an on-line trading portal. This is also the data that is used to create the scroll you see at the bottom of financial news networks.
Unfortunately this data does NOT provide a complete picture of what occurs on our national exchanges. This is because the public feeds only provide data on the price and size of the best bid and best offer for each stock on each exchange. These data sources do not provide information on orders placed below the best bid or above the best offer. Nor do these sources provide data on trades of less than 100 shares. Without any this data, you cannot perform full depth-of-book analyses, and you really can’t understand market structure.
To get to this data you need to additionally collect and process separate proprietary feeds made available by each of the exchanges. These feeds are typically used by only the most sophisticated of market participants such as market makers and high-frequency traders.
In fact, we have spoken with many buy-side market participants and have learned that even the largest and most sophisticated of these firms generally do not attempt to consume this data – it is extremely voluminous, challenging to process correctly, and requires specialized data expertise.
But it’s what we needed, and it’s what MIDAS gives us.
I find it incredible that in a period of only 6 months we went from being significantly behind most market participants in this area of technology to leap-frogging most buy-side firms and landing on par with regard to the data collection and analysis capabilities of many high-frequency trading firms.
But before I tell you what we’re doing with these new capabilities, I thought you’d be interested in knowing how we pulled it together so quickly. Simple – the entire system is hosted in the Cloud.
There is no hardware to support, no software upgrades to maintain, no data feeds to handle, and hence no SEC resources are required for these tasks. Users at the SEC simply log into MIDAS from their own desktops whenever they want to access the system.
If we need to perform a very large analysis, we employ multiple servers and invoke parallel jobs. Access to processing power is just not an issue, at least not at present.
I have to say I think this is a pretty progressive approach – indeed more so than many of today’s financial firms. In fact, there is a specific government program called FEDRAMP designed around the ability of government agencies to use the Cloud in a secure and reliable fashion. Our IT folks at the SEC have been on the vanguard of this new program, working closely with our vendor and cloud provider on FEDRAMP protocols.
Not bad for a government agency that’s almost 80 years old.
Of course MIDAS itself, though quite advanced, is still just a data collection, analysis and research platform. Its value ultimately derives from the people who use it and build upon it. So let me turn to how we are using it at the SEC.
The system has now been rolled out to many dozens of people. We have about 100 registered users from across the country: New York, LA, Chicago, San Francisco; and from across many SEC divisions and offices: Trading and Markets; Economics and Risk Analysis; Enforcement; and Compliance, Inspections, and Exams.
The system itself is based on UNIX. Users log in, are dropped into a UNIX shell, and have all of the powers of multi-processor UNIX workstations at their disposal. We have users writing C code, AWK scripts, and PYTHON modules; each customizing their analyses to the specific task at hand, sharing scripts, and collaboratively building larger processes.
Very iterative, and very agile.
To help everyone get started and introduce new users we hold regular two-day intensive training sessions on how to use MIDAS to analyze market structure. As might be expected most users have a technical background, are financial engineers, quants, economists, and market practitioners. But not everyone – believe it or not, some are security lawyers.
In our recently-established Office of Analytics and Research we have market practitioners with technical backgrounds who are naturally adept at using systems such as MIDAS. Where they may need to spend some extra time is learning the nuances of specific SEC regulations. As such, they may have a copy of the Securities Act of 1934 on their desk, or more likely a bookmark to the PDF in their browser.
But what I really find interesting is when I see a copy of Unix in a Nutshell on the desk of an enforcement attorney.
In fact, last month we held an intro to MIDAS session for non-technical personnel and over 100 staff attended.
I recognize that there has been a lot of criticism of the SEC’s technical abilities and capabilities in these areas in the past. However, a lot has changed and I would strongly encourage those who remain skeptical to revisit some of their assumptions.
The potential uses of a technology like MIDAS are numerous but can generally be broken into three categories:
Real-Time monitoring of market activities,
Forensic analysis of market events, and
- Market structure research that can more fully inform substantive policy decisions.
Allow me to describe these in reverse order.
In January of 2010 the SEC published a Concept Release on Equity Market Structure3 in which it asked numerous questions about the nature and impact of such topics as high-frequency trading and dark-pool trading on market efficiency, fairness, competition, and investor protection.
In the three years that have followed the publication of this document we have received many comment letters, read through many academic and industry studies, and have held numerous meetings with a wide variety of market participants on these topics. And though most participants generally express the opinion that today’s markets are significantly better and more efficient than they were a decade ago, we have heard many strong and opposing views on just about every aspect of where to go next, and how to address some of the more recent changes brought about by technology.
Traditionally, most of the data and analysis we learn about at the SEC comes from industry input and the public comment process. Indeed, we ask for this at almost every meeting with market participants. But when you are faced with so many opposing views, it’s also helpful to be able to perform your own analysis to complement the public comment process.
And that’s exactly what we are doing with MIDAS.
Every day we collect about 1 billion records from the prop feeds of 13 national equity exchanges. These feeds provide us with info on every trade and every displayed order, quote, modification, and cancellation on those venues. And not just on a daily basis, but also historically. In fact, our new technology allows us to readily perform analyses of thousands of stocks over periods of 6 months or even a year, building up statistics on 100 billion records.
We have, and are, learning a lot by studying these feeds.
For example, one of the bigger concerns we hear from market participants is the ever-increasing speed at which orders are sent to, and often canceled at, exchanges.
Some have called for regulators to “slow down” the market, and some foreign jurisdictions have gone as far as proposing or adopting rules they believe will do just that.
But I personally find it difficult to approach the question of whether we should, and if so how we might, “slow down” the markets, or even determine that the markets are indeed too fast, until we meaningfully measure its actual speed.
Here are the issues: First, we need to be more deliberate and careful in the way we describe market speed, especially in the media. There is a humongous difference between a millisecond and a microsecond: the latter is a thousand times smaller than the former. So it behooves us all to stop throwing out terms that have precise meaning without any consideration of what we are really saying.
Second, I don’t think market participants really care how long it takes the average person to blink, so I’m not sure why everyone keeps comparing the speed of the market to this benchmark. I’m also not sure why, absent any others facts, people should be assuming it is problematic for trading to occur faster than the blink of an eye.
But aside from my concerns about general market misperceptions and hyperbole, I think we can and must do a much better job at assessing the speed of the market in ways that would more directly inform policy.
Allow me to explain why.
Most market participants, as well as most academics, base their observations about the speed and depth of the markets by analyzing the consolidated public tape. However, this view does not reveal the actions of individual market participants, even anonymously. The consolidated tape is rather a summary of the aggregate behavior of many market participants acting together, and reacting to each other.
Let’s talk numbers for a moment: assume that 10 market participants are each bidding 100 shares to buy the same stock on the same exchange at a price of $25. If $25 was the best bid on that exchange, the consolidated tape would show this as 1,000 shares bid at $25.
Now consider what happens if an 11th participant decides to join the bid with an additional 100 shares, but shortly thereafter an existing bidder decides to cancel his prior order for 100 shares. In this case the tape would show 1,000 shares of interest at $25 jumping to 1,100 shares, only to quickly fall back to 1,000 shares.
For all intents and purposes, it would seem that someone posted, and then quickly canceled, 100 shares – producing what is often called a flickering quote.
But who, in this instance, is the party that produced this flickering quote? The answer is, surprisingly, that no individual party produced the flickering quote. It is simply a ramification of distinct parties each acting, and then reacting, at similar times.
I think if we are to better understand the speed of the market, address concerns, and even consider actions, we must start by analyzing data that informs us on how market participants actually behave, and not limit ourselves to simply observing the results of their behavior in aggregate. Fortunately, we can do just that using tools like MIDAS.
This is because the data provided on many of the prop feeds is tagged in a way that allows us to track the life of individual orders with microsecond granularity. Using data like this we can much more robustly and meaningfully measure the speed of the markets by, for example, determining what fraction of participant orders are canceled within 1 second, 100 milliseconds, or even 10 microseconds.
By observing orders that result in executions, as opposed to cancellations, we can better understand how long it takes market participants to react to market quotes. Do market participants tend to hit quotes within 100 milliseconds after they are posted? Or is that indeed too fast?
Answering these basic factual questions must be the starting point for any serious dialogue about market speed, and I’m very glad we now have the tools and technology to be able to do so.
So that’s just one example of how we are using MIDAS for market structure research that may better inform significant policy-related questions. Look to this space for a lot more details on this in the coming months and quarters.
Now let’s move up the list to the second topic I mentioned – forensic analysis of market events. Though the original impetus of MIDAS was to be able to more readily reconstruct market-wide events (ala the Flash Crash), we have found it quite useful in examining the various individual-stock events that have captured the attention of the markets as well as the media. I am of course talking about so-called mini-flash crashes.
When I last discussed this topic at a SIFMA conference, I indicated that many of these incidents seem to involve fat-fingered issues, mis-specified orders, or similar causes. At that time our assessment of any given event was based on follow ups with the exchanges, FINRA, and sometimes even the party originating the orders.
But now that we have MIDAS, we can examine these in a lot more detail. In particular we can couple what we learn about the root cause of an issue using our regulatory authority, with a data-driven analysis of how the event played out in the market, and, perhaps most importantly, how the markets reacted.
For example, a few months ago the traded price of a large cap stock suddenly spiked down and then back up in less than a second. The move was not large enough to trigger a circuit-breaker, or affect subsequent trading, but it was large enough to be noticed.
These types of events tend to trigger a lot of media coverage as well as concern by the investing public. And by investing public, I include both market professionals as well as retail investors. Coverage and concern by itself is not necessarily a bad thing, but I do worry about how these events are being interpreted.
It seems that over the past few years a popular meme has emerged: that collectively these sudden price spikes are symptomatic of a “broken” market; that they demonstrate the fragility of our current market structure, that they must be caused by nefarious players; and that they are precursors to another market-wide flash crash, the same way that ground tremors may occur before a large earthquake.
Now that sounds like a terrific story, but is it actually true? What do these events really tell us about the market, and what actions might we want to consider?
Well, let’s consider some of the reasons a large cap stock might suddenly spike down and then back up. One possibility could be that liquidity providers suddenly pulled out of the market (as they did during the Flash Crash). If so, we might consider actions related to the provisions of liquidity.
It also could be that the price declined because an automated trading algo or execution algo went wild and started to send sell orders into the market. If so, that would tell us something about computer-based trading strategies and speed.
Another possibility could be that one or more parties purposely triggered a sell-off or otherwise tried to manipulate the markets. If so, that could suggest potential illegal activities – as I’ve said before, what’s illegal is illegal at any speed.
Or it could be that the sudden sell off was caused by some combination of parties or algos inadvertently piling on in an uncontrollable fashion. That would indeed suggest issues related to complex systems and maybe even suggest market fragility.
My point is that there are many potential causes of such events, and without understanding the cause, it’s very difficult to diagnose the problem, determine its severity, and consider what actions, if any, may be required.
So that’s what we typically try to do: understand the cause. We work with the exchanges and FINRA to properly understand what actually happened, not just what appeared to have happen by looking at the public tape.
And what we generally have found is that sudden spikes are not typically caused by any of the reasons I just mentioned. Rather, they tend to be triggered by old-fashioned human mistakes: a trader sends a large limit order to a market center but inadvertently drops the limit price thereby creating an oversized market order; an investor makes a fat-finger mistake and sends a market order for 100 times more shares than he wanted; a portfolio manager enters a large order into the wrong screen, resulting in an unanticipated request for immediate execution instead of having the flow managed.
Contrary to the public speculation, these specific types of events don’t seem to be typically triggered by proprietary, high-speed algorithms, by robots gone wild, or by excessive order cancelations. As you all know, even the issues with Knight last year were not caused by high-frequency trading algos but rather, as has been extensively reported, by technical issues with routing customer order flow.
Now a skeptic may question whether or not these events, even if they not necessarily triggered by high-speed trading algorithms, are nonetheless amplified by them, or other market practices.
For instance, what if small mistakes are quickly cascading into larger market moves because algos are piling on? Or perhaps market participants are quickly canceling their orders as prices fall, leading to a larger-than-expected decline.
These are great questions to ask. And fortunately, we now have tools to begin exploring the answers. Using MIDAS, we can analyze the order books of a stock before, during, and after any event. This is a powerful capability that will allow us to better determine the extent to which markets are resilient to human mistakes, or susceptible to such.
Though we’ve only just started to undertake these types of analyses, so far in the few cases we’ve looked at where prices very quickly spike down and recover, we have not seen instances of piling-on, or existing orders being withdrawn.
In one recent instance we analyzed, we actually found the reverse to be true. A careful review of order book data from one of the exchange feeds revealed that just before the price fell, there was only sufficient liquidity to support two-thirds of the incoming sell orders triggered by a fat-finger mistake. So how come the price did not fall further? In this instance it appeared that not only did market participants not cancel their prior orders to get out of the way, but that at least some participants actually added liquidity and “caught” the incoming order.
Examples like this just don’t support all the rhetoric about the nature of mini flash-crashes.
Let me be clear, I believe that there are many questions we must continue exploring about the nature of high-speed trading, the automation of our markets, liquidity provision, and how all this relates to spreads, investor costs, market efficiency, and even fairness. These are complex questions and, as mentioned, we are undertaking some very significant analyses to better understand the issues in ways that will (hopefully) better inform policy.
But for those who look to, report on, and try to use instances of mini-flash crashes as clear and incontrovertible evidence of the problems with high-frequency trading, high-speed markets, fragility, and impending doom, I think you may be looking in the wrong places. Why does this matter? Because if we don’t diagnose problems correctly, we certainly won’t arrive at the correct prognosis.
So what is the correct diagnosis when it comes to mini-flash crashes? I think what we are seeing is the impact of something that, to me, is much more disturbing. What we are seeing is the result of sloppiness combined with a lack of checks and balances. And the reason I think this is more disturbing is because in this day and age, there should be no excuse for these types of mistakes, especially considering the negative impact these events have on investor confidence.
Earlier, I mentioned there we were collecting about a billion records per day from the MIDAS prop feeds. Do you know how large a billion is? Well, if you lined up all the records of every order, every cancel, every trade, and every quote in the lit markets end-to-end, they would fill up all the space…in a smartphone.
That’s right, a day of market activity, properly compressed, is less than the 64GB of memory I carry with me in my coat pocket. Given today’s technology, a billion records just isn’t a large as it used to be. How come we can stream, in near perfect fidelity, the HD version of Spiderman to an iPad while traveling at 35,000 feet across the country in seat 12E, but we can’t reliably check to make sure a market order for 500,000 shares of IBM is not supposed to really be an order for 500 shares.
These are most certainly solvable issues, but only if we spend the time to address them. And by we, I mean you – the people in this room, because the solution ultimately lies in using technology to better filter and, if needed, prevent, humans from continuing to make market-moving mistakes. In fact, definitively addressing these types of glitches and snafus may do more to improve investor confidence than any other set of actions undertaken by the markets.
As regulators, the stage has already been set. In 2010 the Commission adopted Market Access Rule 15c3-5. Though this sounds technical, the idea is simple: broker-dealers with direct market access should have risk management controls and supervisory procedures reasonably designed to systematically prevent erroneous orders they, or their customers, might send to the market.
Complementing the Market Access Rule, the Commission recently proposed Regulation SCI, which stands for Systems Compliance and Integrity. This proposal calls for market centers to establish, maintain, and enforce written policies and procedures reasonably designed to ensure that its system have levels of capacity, integrity, resiliency, availability, and security, adequate to maintain operational capabilities and promote the maintenance of fair and orderly markers.
Between proposed Reg SCI and Rule 15c3-5, the message to all major market centers and their customers is this: with the use of technology comes the responsibility to have systems that are working as intended, and market activities, especially those to do with order generation, routing and flow, are as robust and reliable as we’ve come to expect from the world’s largest and most-watched capital market system.
In the remaining time I have I’d like now move to the first, and perhaps trickiest, of the three topics – real time monitoring of the markets. MIDAS was designed to provide us with access to the prop feeds and the public tape throughout the day as markets move. Ultimately, we would like to use the real-time nature of MIDAS data for intraday alerts of abnormal, or outlier, activities that may deserve further follow-up.
What makes this process tricky is not necessarily the availability of data, or even the underlying technologies (though building a reliable and meaningful real-time monitoring system is a huge task). Rather, the tricky part is figuring out what is abnormal or outlier activity, and what is just an uncommon activity.
Many people would say that an outlier activity is something that occurs only 1% of the time, or maybe even 0.1% of the time. But remember we have a billion records a day. If outliers are defined at the 0.1% level, that would imply one million potential outlier events.
The fact is that simple broad-brush strokes won’t teach us much. The vast majority of outlier events are nothing more than just that – outlier events. As humans, it is in our DNA to try to interpret all such events, look for meaning, and take action. It’s why we see faces in clouds, and now, as recently reported, rodents on Mars.
Monitoring therefore takes a lot of forethought, and a good monitoring program requires detailed knowledge of market structure so you can make intelligent choices about events you flag, and events that are less interesting. The exchanges and FINRA know this well, as they have the front-line responsibilities for monitoring the markets. As we build out our own programs we will likely focus on particular areas of concern and avoid duplicating any existing processes by our fellow regulators.
It is also important to note that though MIDAS gives us a lot more than we’ve ever had at the SEC, it nevertheless does not provide us with a complete view of the markets. We see what you see since we are collecting data from public and commercial feeds. MIDAS does not contain non-public data. As such, there is no information on off-exchange quoting and nothing about how orders are routed, re-routed, aggregated, and disaggregated across broker-dealer systems. Orders are not tagged with the names of each broker-dealer, and trade executions don’t come with account numbers or customer identifiers.
This is why last year the Commission adopted rules requiring the SROs to develop a Consolidated Audit Trail, or CAT, for regulatory use. By design, this system will provide a much more complete view of the equity and equity options markets, and will include such data elements as customer ID.
As many of you know, the SROs have been engaged in a very public process regarding the development of their CAT plan – they’ve held numerous meetings with market participants, issued their own RFP, and even have a dedicated web site where the public can track progress. My colleagues and I have been following the process and are very much looking forward to receiving the final proposed plan, which will of course be published for notice and comment.
In some respects, MIDAS is a bit of a precursor to CAT, at least for those of us performing the analyses and buried deep in the data. As such, it’s really just the beginning of an exciting trend in market regulation that combines advanced technologies and data, with, most importantly, talented and dedicated staff with a passion for understanding the markets. And for technologists and quants, the SEC has become an incredibly interesting place to work.
So let me conclude by saying, if you share our passion for tackling some of these complex and interesting challenges, now is a great time to consider joining the SEC. We are no longer your father’s securities regulator. We now have the analytical tools, the data and the mission-focus to enable talented and dedicated professionals to pursue exciting and rewarding careers in quantitative analysis for the purpose of protecting and enhancing the integrity of our financial markets
1 Findings Regarding the Market Events of May 6, 2010 – Report of the Staffs of the CFTC and SEC to the Joint Advisory Committee on Emerging Regulatory Issues, September 30 2010 (http://www.sec.gov/news/studies/2010/marketevents-report.pdf)
2 Albert J. Menkveld and Bart Zhou Yueshen, Anatomy of the Flash Crash, April 2013.