The Commission’s Production and Use of Structured Data
Mark J. Flannery
Chief Economist and Director
Data Transparency Coalition’s Fall Policy Conference, Washington, DC
Sept. 30, 2014
Thank you so much for inviting me here to speak on open data in financial regulation. Before I begin my remarks, I must make clear that the views I express today are mine alone and do not necessarily reflect the views of the Commission or of my colleagues on the Commission Staff.
I must also explain that I’m new to the Commission and have been serving as its Chief Economist for less than a month. But I’m no stranger to the importance of data, and of structured data in particular. Indeed, I am keenly aware of the important role that high-quality financial information plays in the efficient operation of capital markets and their oversight by regulators. I have been researching and writing about issues related to financial intermediation for more than 30 years. Without access to useable and high-quality financial market data, much of my work would have been impossible.
Today, I would like to speak about the Commission’s efforts to expand the public’s access to the financial data submitted to the SEC by market participants. I hope to impart several broad messages. Most importantly, it is already evident to me that the Commission and its staff have a sophisticated understanding of structured data. Note that I am consciously using the broader term “structured data,” rather than referencing any particular technology for structuring information. Indeed, Commission staff is expert in the various models available to produce financial disclosures in standardized formats that make those disclosures broadly useable by financial market participants. Making useable data available to the public is a key function of many of the Commission’s disclosure rules, and one of the strategies identified in the Commission’s Strategic Plan.
But it isn’t just the public who uses that data. The Commission staff are huge consumers of structured data. This is particularly true of staff in the Division I now oversee: the Division of Economic and Risk Analysis. We use financial data from a variety of sources in our economic analysis of rules, risk assessment and market supervision initiatives. We also rely on financial data to support enforcement actions and compliance programs. DERA’s extensive use of structured data enhances our understanding of the best methods and practices for its collection. At the Commission, the most notable and frequently discussed data source is the disclosures in the forms and filings required under the SEC’s reporting rules. But we also frequently use structured data purchased from commercial sources, produced from self‑regulatory organizations, or obtained due to our supervisory authority as part of our examinations and investigations. It is important to recognize that all of these sources are vital to the agency’s mission-critical activities.
The SEC’s Historical Commitment to Structuring Data
I’d like to begin by providing a little bit of history about the accessibility and usability of information on the financial market participants overseen by the SEC. A good place to start is with EDGAR. This is the name so commonly associated with public access to financial disclosures. But I’d wager that few people know that the acronym stands for Electronic Data Gathering, Analysis, and Retrieval. EDGAR was launched in 1994 and fully phased in for periodic reports of corporate issuers by 1996. Since then, a myriad of new financial disclosures, including from investment advisors and investment companies, have been made available on the EDGAR platform.
I think it is reasonable to claim that EDGAR is one of the most important innovations to financial disclosure in the history of financial market regulation. With its establishment, information on public companies was simultaneously available to all market participants in a common form. This single technological change eliminated many of the distinctions that had previously privileged some financial market participants over others. Gone were the days of mail, microfiche, and the need to have a physical presence at a location where this information was made available in non-electronic form. Instead, financial reports became equally available to anyone with a web browser.
EDGAR is still running strong today, and although some might say that EDGAR is an outdated mode of disclosure in an era of lightning-fast communication, its broad reach and fundamental egalitarianism remains. Data is freely available in real-time to all market participants who have in aggregate, during a four-year period surrounding the financial crisis, electronically requested more than 230 million corporate issuer financial reports and registration statements. Moreover, countless data vendors have produced robust business models around structuring these disclosures for commercial purposes. Some of these business models further improve the availability of data to investors – think Google and Yahoo Finance.
Modernizing EDGAR has taken several forms. Perhaps the most important has been to make the information provided on companies’ reports available in a machine-readable format that enhances the usability of required financial disclosures. This effort began more than a decade ago when Forms 3, 4, and 5 – disclosures by corporate insiders of changes in their ownership – were required to be filed in a custom implementation of eXtensible Markup Language (XML). Each Form – #3, 4, and 5 – is assigned a set of predefined “tags” that describe how content will appear in a browser. These tags can also be used to identify and extract the financial elements from each filing for incorporation into a common database of like filings. In 2008, the Commission required filers of Form D to follow a similar XML format. Form D reports information about unregistered offerings of securities claiming a Regulation D exemption. This part of the (private) financial markets facilitates the formation of more than 1 trillion dollars in new capital each year, but with median offering of only a million dollars. It should thus be clear that offering method is particularly important for small companies’ capital raising efforts. As a result of this standardized disclosure, investors – even those with minimal technical skills – can now extract this machine-readable information and track aggregated private offering activity. Recent Commission staff white papers have used this information to highlight and understand the remarkable nature of this segment of our financial markets. In 2009, the Commission began to require that companies provide their financial statements using eXtensible Business Reporting Language (XBRL). As this audience knows well, XBRL is a global XML standard that focuses on the requirements of business reporting. In particular, XBRL allows financial information to be structured according to a taxonomy of financial items that follow agreed-upon standards, such as U.S. GAAP. XBRL appears today in more than 20 SEC forms, each with its own specific taxonomy. For example, the Commission’s rules require operating companies to submit files containing tagged data using XBRL for quarterly and annual financial statements, including footnotes and schedules. In addition, mutual funds are required to submit exhibits containing risk/return summary information using XBRL.
Today, evaluating the appropriateness of machine-readable formats for financial disclosures is a routine part of the rulemaking process. DERA staff works closely with the Commission’s rulewriting divisions and the Office of Information Technology to facilitate this process. The Commission has incorporated the collection of structured data into several recently adopted and proposed rules.
For example, the Commission recently adopted amendments to Regulation AB, the regulation governing the issuance of asset-backed securities. The amendments require the disclosure of loan-level information in a machine-readable format for registered offerings of certain asset-backed securities. As a result, investors can perform their own quantitative assessments of the underlying risks of the securitizations. Another example relates to money market funds. Since 2011, these funds have been required to disclose their portfolios monthly on form N-MFP. The Commission has also recently proposed new rules governing “crowdfunding”  and amendments to Regulation A, which would require issuers to file certain key financial information in an XML format.
These examples illustrate that structured data reporting provides easier access for the Commission and investors to key information about security offerings. Structured data also enables easy comparisons among available investment opportunities. Going forward, I expect that the Commission will continue to solicit comment from the public on whether and how to standardize disclosures in rules and make them conducive to the collection of structured data.
Remaining Aware of the Costs and Benefits of Structured Disclosures
It should be apparent by now that my overarching theme today is that structured data of the sort just discussed is extremely valuable both to the Commission and the public. Yet there remain common misperceptions of what structured data can do and the ease with which it can be produced. This issue is most commonly brought up in the context of how best to capture the required disclosures by SEC registrants. In recent years, there have been calls to increase the amount of financial information made available to investors in structured format. While adding structure to many types of financial information is theoretically possible, there are practical limits to the usefulness and desirability of doing so. Some financial information is straightforward to structure, like the face financials of corporate issuer filings or the securities holdings of investment managers. Other important financial information often comes as narrative disclosures – such as footnotes to the face financials or the management discussion of risk factors in a prospectus filing. How to efficiently structure this type of information requires careful thought.
As a general matter, to maximize the benefits of structured data, the Commission must be mindful of various trade-offs when incorporating such requirements into its rules. The staff must identify not only what information should be disclosed on various forms, but also what elements of that information should be “tagged” so that they can be extracted in a structured data format. This requires both anticipating what information would be most useful to investors, and also determining the most useful form of its disclosure: should the required data format be XBRL, custom XML, or some other standardized data format? Deciding on the right data format involves many considerations, including the complexity of the financial information, need for validation of the reported elements, and the availability of pre-existing industry standards. And all of these considerations may be influenced by the information needs of particular financial markets.
Moreover, it is important to be careful when incorporating structured data formats into new disclosure requirements to avoid unnecessary implementation challenges. Indeed, the Commission has taken implementation issues into account as it has required structured disclosures. As I mentioned previously, the SEC first required the submission of financial statements in XBRL in 2009. At that time, standard tools and vendor services were not yet widespread, and most issuers did yet not have experience with the GAAP taxonomy. To facilitate a smooth transition to the new rules, the SEC phased in the reporting requirements over five years based on issuer size and level of tagging detail. Specifically, the implementation started with large firms and basic tagging of the face financials only. This was intended to alleviate some of the initial compliance costs concerns. It also provided time for vendors to develop more sophisticated services, and it allowed smaller issuers to benefit from the implementation lessons learned by the larger issuers. And that learning process continues to this day.
Continually Enhancing the Quality of XBRL Filings
As many of you may know, XBRL includes a standard set of more than 10,000 tags for the U.S. GAAP taxonomy. Filers may use a custom financial element tag for a balance sheet, income statement or statement of cash flow item when a suitable tag is not already available. The more frequently filers select from among from the standard tags, the more easily investors, analysts, and financial market researchers can make inter-company comparisons of the financial disclosures. Unnecessarily high usage of custom tags by a filer can therefore impair certain financial analyses. Commission staff is aware of these issues. DERA staff recently performed and published on the SEC website an analysis of the usage of custom tags from a sample of filings over time. We found a consistent and gradual decline in the use of custom tag rates among large and midsized companies in each year since the implementation of the rules. This is great news for the usability of that data. However, our analysis found that smaller companies continue to have high custom-tag rates. DERA staff analysis of calculation errors in XBRL submissions – i.e., failing to provide the required calculations for the face financials – also showed a higher incidence among smaller companies.
Other analysts in industry and academia have found similar trends in smaller firm’s XBRL reports.  What can we do to improve the XBRL reporting standards? Well, of course, there are two views. Some observers propose more aggressive supervision and enforcement of compliance in order to increase the quality and usability of the information. Others advocate an exemption of the requirements for smaller companies to reduce the financial burden associated with their compliance.
My view is that the Commission is already taking the most prudent course, continuing its efforts to monitor filing quality and educate filers. As for all new compliance experiences, time is required for sufficient learning to overcome the inevitable start-up problems and costs that companies incur. It is very encouraging that large and midsize companies are demonstrating continuous improvement. Smaller companies have had less time to comply. We should not be surprised at this point that their improvement is slower, given their more limited resources. Our staff analysis shows that there continues to be significant innovation in the XBRL-related services industry – there are currently more than 30 third-party XBRL providers compared to 11 in 2009. Moreover, the creation of tagged data output has resulted in greater automation within the internal reporting process at companies who now can use new vendor products that integrate XBRL tagging into their financial reporting tools. As their product and service offerings continue to improve and their market shares continue to shift to reflect their relative improvements, we should expect to see tangible benefits among all sized filers.
Moving forward, DERA staff will continue to address the quality of XBRL submissions by periodically analyzing their content for accuracy and completeness. And where appropriate, DERA staff will work closely with the Division of Corporation Finance to provide guidance to filers based on these observations. This effort will occur through DERA’s Office of Structured Disclosure – the newly renamed successor office to the Office of Risk Assessment and Interactive Data.
Much of the focus will be on how companies and in particular smaller companies are complying with the requirements, and assessing whether the quality of their filings begins to follow the observed trends of quality improvements shown by midsize and large companies. Helping smaller companies to improve the quality of their data is important because their ability to disseminate machine-readable financial information critically enhances their ability to access capital in financial markets. Without reliable structured data for small companies, data aggregators, financial service providers, analysts, and investors will be less able to compile the relevant information for the purpose of cross-company analyses. Moreover, having this financial information available for all SEC reporting companies is key for accurate estimates of company risk, which improves as the amount of information available on market participants increases. Ensuring that market participants have access to useable, high-quality information about these smaller companies is an important goal articulated by the Commission in 2009 when it required the submission of tagged financial statements. We in DERA are committed to helping fulfill that original aim.
Hence, expect to see more staff observations and updates of filer practices posted on the SEC website. DERA staff will continue its outreach to corporate filers through seminars, webinars, conferences, and other educational programs. DERA staff are also exploring ways to make aggregated XBRL data available to investors and financial researchers so that they can more easily access and analyze the financial information reported through XBRL submissions.
Finally, we continue to seek new ways to facilitate the submission and use of structured financial data. For example, DERA staff is working with outside contractors on “Inline-XBRL.” Consistent with its name, this new technology would allow companies to integrate (or embed) the XBRL tagging of the financial statements directly into their standard HTML formatted 10-K and 10Q filings. This effectively eliminates the need to reconcile separate HTML and XBRL versions of the financial statement content, thus reducing the possibility of rekeying or similar errors. Work is also proceeding on a prototype viewer that would allow users to display and search the integrated XBRL tagging while viewing the familiar HTML view of the financial statements. In short, then, SEC staff are committed to improving the availability of financial information through the presentation and analysis of structured data.
In conclusion, let me thank you again for the invitation to speak with you today about such important issues. And thank you for your attention.
 The Securities and Exchange Commission, as a matter of policy, disclaims responsibility for any private publication or statement by any of its employees. The views expressed herein are those of the author and do not necessarily reflect the views of the Commission or of the author’s colleagues upon the staff of the Commission.
 See http://www.sec.gov/about/sec-strategic-plan-2014-2018-draft.pdf.
 DERA Staff analysis of the EDGAR log files from January 2008 through March 31, 2013. Estimate is based on investor requests through the sec.gov web portal for the following form types: 10-K, 10Q, 20-F, 40-F, S-1, S-2, and F-1.
 See Release No. 33-9638 (Sep. 4 2014), Asset-Backed Securities Disclosure and Registration.
 See Release No. 33-9470 (Oct. 23, 2013), Crowdfunding.
 See Release No. 33-9497 (Dec. 18, 2013), Proposed Rule Amendments for Small and Additional Issues Exemptions Under Section 3(b) of the Securities Act.
 Columbia Business School White Paper, “An Evaluation of the Current State and Future of XBRL and Interactive Data for Investors and Analysts,” available at http://www4.gsb.columbia.edu/filemgr?&file_id=7313156.
 See Release No. 33-9002 (Jan. 30, 2009), Interactive Data to Improve Financial Reporting.