Skip to main content

The Promise of Structured Data: True Modernization of Disclosure Effectiveness

Nov. 17, 2020

Remarks at the XBRL US Investor Forum 2020: Ready for Anything – Using Data in Perilous Times

Thank you, Al [Berkeley], for the introduction, and thank you to Campbell [Pryde] and Michelle [Savage] for the invitation to be here today. I appreciate the work of this organization and its support of digital business reporting, including enhancing the quality and usability of data in SEC filings for investors, regulators, academics, and the public more broadly.[1]

The SEC began its first foray into making data electronically accessible back in 1984 with the introduction of EDGAR as a pilot program. The SEC’s annual report that year stated that EDGAR was designed to “accelerate dramatically the filing, processing, dissemination and analysis of corporate information; revolutionize the manner in which many investment decisions are made and executed; and contribute to the efficiency of the securities markets.”[2] By 1996, EDGAR filing was required for all domestic issuers.[3] Prior to that time, well, you could come knock on our library door and take a seat in the reading room to leaf through paper filings if you wanted financial market information. As antiquated as that sounds, I’m sure I’m not the only one among us today who spent time in that reading room doing just that.

In fact, as I began to prepare my remarks for today and dig into the history of electronic data at the SEC, my counsel asked me whether I had any good stories from early in my career dealing with hard copy document review or research using microfiche and the like. Little did she know that I learned to type on a manual typewriter and made copies using something called a mimeograph stencil duplicator. So, yes, I’ve got some epic low-tech/no-tech stories. That background only makes the value of digital data crystal clear to me.

I’ve dated myself by telling that story to highlight the sea change that occurred with electronic filings. The next frontier in this space has been structuring the data within these filings. As you all know, the Commission’s history with structured data dates back to the early voluntary XBRL program adopted in 2005 for public company financial statements. The voluntary program was expanded to include mutual fund prospectus information in 2007. And those voluntary programs were made mandatory in 2009, along with some additional requirements for credit rating agencies that year.[4] The Commission took another important step forward in 2018 when it amended its XBRL requirements for financial statement and prospectus information to require the use of Inline XBRL.[5] This technology, which involves embedding XBRL into disclosure documents so they are both human- and machine-readable, is expected to reduce costs and improve the quality of structured data.

The introduction of structured data requirements for financial statement and prospectus information has had significant benefits for investors, analysts, and other market participants, making it easier and less costly to extract, filter, compare, and otherwise analyze the information in SEC filings. It has facilitated comparison across time periods, across companies, and even between data in SEC filings and other agency filings. It has enabled more sophisticated analysis, including by our own agency staff, as increased structured data has underpinned our own efforts to incorporate data analytics into our processes.[6] Structured data in our filings also enables robust academic study.[7] All of which contributes to the efficient operation of the capital markets and their oversight by regulators.

And it isn’t just those who can afford expensive technology that benefit from structured data. Retail investors realize the benefits as well in multiple respects. First, most retail investors access the market through large institutional investors, and those investors are more likely to leverage structured data in managing their investments.[8] But, in addition, retail investors benefit as downstream users of tools that rely on structured data, and as consumers of analysis that is facilitated by structured data.[9] Also, the Commission staff extracts and makes certain data sets available online to facilitate open access.[10]

Structured data is a pivotal factor in ensuring that EDGAR continues to live up to its original promise of revolutionizing investment decisions and enhancing the efficiency of the securities markets. But, just as financial market participants continue to innovate and evolve with respect to the data they generate and use, the SEC must keep pace and continually evaluate how to enhance the usability and quality of data in our filings. We’ve done a lot of rulemaking recently in the name of “modernization” and disclosure effectiveness, and achieving modernized, effective disclosure should include thoughtful consideration of structured data standards like XBRL. 

Today I want to highlight three areas of focus as we continue to consider disclosure effectiveness and modernization. First, expanding the use of XBRL outside of the financial statements, specifically to proxy data, climate change and other environmental, social and governance (or ESG) data, Management’s Discussion and Analysis (or MD&A), and earnings releases. Second, continued emphasis on improving the accuracy and quality of structured data. And third, the benefits of open, freely available financial identifiers.

Beyond Financial Statements

Structured data requirements have thus far been significantly concentrated in financial statement information. Given the nature of the data in financial statements and its central importance in the disclosure regime, it made sense to essentially start there. Now, however, it’s time to take what we’ve learned from this process and consider its application in other areas.

We did that in 2019 when we required the structuring of information in the cover pages of periodic filings.[11] In many other rulemakings, however, we have considered but declined to propose or adopt structuring requirements citing the costs to issuers.[12] That is, of course, a significant factor to quantify and consider, but we must weigh that consideration against the costs to investors and market efficiency more broadly when we don’t require structured data. Especially as we see the costs of tagging going down,[13] we must ensure that the costs on both sides of the equation are carefully analyzed and considered. As we continue to modernize, we should consider obvious places where structuring could be relatively simple and would provide significant transparency benefits.

For example, proxy voting data. Dodd-Frank mandated the SEC to issue rules requiring institutional investment managers to report annually how they voted proxies relating to executive compensation or so-called “say on pay” votes.[14] In 2010, the SEC issued a proposed rule pursuant to that mandate.[15] Under the proposal, Form N-PX – a form currently used by investment companies to report their proxy votes –would have been amended to accommodate these new reporting requirements by institutional investment managers. In connection with that proposal, we asked questions regarding the prospect of structuring the Form N-PX voting data. Ten years later, however, we still have not finalized that rule, nor have we gone forward with efforts to structure voting data reported on Form N-PX.

What kind of data are we talking about here? The most basic information that an investor might want: how their money is being voted in corporate elections, and whether their shares are being voted in their best interest or in accordance with their instructions. We could bring much greater clarity and transparency to investors regarding how their voting rights are being exercised with the simple expedient of finalizing this rule and adding a requirement, as discussed in the proposal, to tag the Form N-PX voting data.   

N-PX filings are voluminous in nature but would likely require relatively few, straightforward data tags. Thus we could potentially take a large body of important information and dramatically increase its usability through a relatively simple taxonomy.  

Another area that could benefit from structured data to support usability and comparability is in the area of climate change and other ESG risks and impacts. As you all know, climate and other ESG-related metrics are of ever-increasing importance to investors, surpassing even traditional financial statement metrics for many.[16] Of course, there are currently little to no standardized climate or ESG disclosure requirements. Indeed much of that disclosure occurs voluntarily and outside of SEC filings altogether. As I have said elsewhere, developing standardized climate and ESG disclosure requirements should be a top priority for the Commission.[17] As we consider this, we should also consider how to make the data disclosed under such requirements as usable as possible, including through tagging requirements.

Much of our structuring requirements so far have been backward looking – requiring us to consider how to structure information that is currently disclosed in a non-structured manner. As we consider new climate and other ESG requirements, we would have the opportunity to simultaneously consider how to make those requirements amenable to structuring. Instead of an ex post facto application of structuring requirements, the two could develop in tandem. 

Finally, I’ll just mention briefly, MD&A and earnings releases. As commenters including XBRL have pointed out,[18] disclosures under MD&A may benefit from some simple block tagging that could greatly enhance comparability of certain relatively consistent types of information disclosed in MD&A. And earnings releases, particularly given their often market-moving nature, appear to be another well-suited candidate for tagging.

Enhancing Accuracy and Quality

Now, a second area worthy of focus is how to enhance the accuracy and quality of structured data. This is, of course, ultimately the responsibility of the reporting entity, but there are no specific requirements for third party validation or verification of data tags. Indeed, in its 2018 release adopting Inline XBRL, the Commission specifically declined to require any auditor assurance related to XBRL or even transparency around any auditor involvement.[19] As we continue to consider how best to incorporate structured data into various SEC reports, we should also revisit the issue of third party verification, including whether in an Inline environment, review of the tags should be part of the audit. 

Some research has shown material errors rates in data tagged in our filings, including errors in tags that are likely to be crucially important to investors like Revenues, Net Income, and Assets.[20] There are other issues that can affect the quality of data in structured filings, such as the use of customized tags. In recognition that a taxonomy may not capture every data point, filers may use custom or extended data tags outside of an established taxonomy. Custom tags accommodate unique circumstances, but the greater their use, the less comparability is afforded, and the greater the potential for noise in the data.

We have reason to believe there has been some progress in both regards, with studies showing a decrease in overall tagging error rates after 2012, and a potential decrease in the use of custom tags over time as well.[21] I’m sure that is attributable in part to the thoughtful work of FASB and Commission staff in the continued enhancement of the taxonomies available. I also commend the work of XBRL’s Data Quality Committee in the development of tools to help enhance accuracy and quality.

But, as we know, investors and others—including the Commission staff—rely heavily on structured data in order to analyze information in SEC filings. In fact, some recent rulemakings cite to the availability of structured data in financial statements as a basis for the proposed elimination of other disclosure requirements. For example, the recent MD&A proposal suggested eliminating the tabular presentation of certain information such as contractual obligations on the grounds that the data in those tables was largely duplicative of data available in the financial statements.[22] Therefore, the reasoning goes, investors could relatively easily compile this information in a tabular format on their own because it is available elsewhere in XBRL format. This reasoning assumes that the data is reliably and accurately tagged.

Because of this level of reliance on structured data, it is crucial that we continue to work to improve its quality. While some progress has been made, I’m interested to hear from all market participants about ways we could potentially verify XBRL tagging in a more comprehensive and systematic way. The quality of structured data is only as good as the quality of the process for producing it. And if it’s worth doing, it’s worth doing right.

Financial Identifiers and XBRL

I know following me is a panel that will discuss the use and adoption of data identifiers such as the Legal Entity Identifier (or LEI) by investors, businesses, and regulators. I note that the LEI in XBRL Working Group published an LEI XBRL taxonomy that can be used in XBRL applications to unambiguously identify companies.[23] The potential to more easily link LEIs within regulatory reports facilitates both research and meeting regulatory requirements such as know your customer and anti-money laundering. Overall, this promises another means of enhancing the usability of data in the Commission’s filings.

The Commission participates in other international efforts to implement uniform identifiers such as the Unique Product Identifier or “UPI.” The UPI and perhaps other unique data identifiers could also be incorporated into XBRL as open, freely available alternatives to proprietary identifiers such as CUSIPs. I look forward to hearing what those panelists have to say.

As the utility of the LEI increases and work on other identifiers progresses, the Commission should carefully consider adapting its regulations and forms to incorporate them. One advantage of these uniform identifiers is that they are open, not subject to commercial licensing and freely available.  It will likely be a while before the UPI is implemented, so I want to highlight another alternative that the Commission has considered, which is whether freely available, open data identifiers for financial products such as the Financial Instrument Global Identifier or (FIGI) should be allowed in reporting. FIGI is an alternative to proprietary identifiers such as the CUSIP or ISIN.[24] Given that there are non-commercial, open, and freely available alternatives to proprietary identifiers, it makes sense to consider how to allow their usage when it comes to regulatory reporting requirements. 


The bottom line is we are living in a global economy, and in an era of Big Data. We must keep pace with the sophisticated technology and data produced and used by financial market participants worldwide.  This means carefully assessing when and how best to bring structured data into the disclosure process, how to maximize its reliability, and how to incorporate broadly accessible financial identifiers that complement and enhance the usability of the data. I appreciate the work that XBRL does on all of these fronts and look forward to learning from the panel discussions today. Thank you again for having me.


[1] The views I express today are my own and do not represent those of my fellow Commissioners or the staff.

[3] See Important Information About EDGAR (“Companies were phased in to EDGAR filing over a three-year period, ending May 6, 1996.”).

[4] See Structured Disclosure at the SEC: History and Rulemaking. Also the Commission required XBRL for certain periodic report cover pages in 2019, and for certain disclosures by registered variable annuity and life insurance separate accounts, registered closed-end funds, and business development companies in 2020. In addition to XBRL requirements, certain other forms including Form N-MFP, Form 13F, and Form D are structured using XML.

[5] See Inline XBRL Filing of Tagged Data, Final Rule, Rel. No. 33-10514 (June 28, 2018) (Inline Adopting Release).

[6] See Scott W. Bauguess, The Role of Machine Readability in an AI World (May 3, 2018) (“From a machine learning perspective, this standardized data can be combined with other relevant financial information and market participant actions to establish patterns that may warrant further inquiry. And that can ultimately lead to predictions about potential future registrant behavior. These are precisely the types of algorithms that staff in DERA are currently developing.”).

[7] See, e.g., Nerissa C. Brown, Shira Cohen & Adrienna A. Huffman, Accounting Reporting Complexity and Non-GAAP Earnings Disclosure, at n.5 (Jan. 2019) (“We also contribute to prior work on the usefulness of XBRL data in capital markets” with research that exploits “an XBRL-based measure of accounting complexity to better understand non-GAAP reporting practices.”); see also Remarks of FASB Chairman Russell G. Golden, XBRL US Investor Forum 2019: Driving Actionable Analytics (Nov. 4, 2019) (“FASB Member Christine Botosan is working on hands-on training sessions to help academics use XBRL data in their research and in their classrooms. There are significant advantages to using XBRL data in academic research. . . . We’re working with the American Accounting Association to identify venues for these workshops. Ultimately, we believe the program will promote increased use of XBRL in relevant academic research and in the classroom.”).

[8] See Rick A. Fleming, The Benefits of Structured Data for Investors (Mar. 24, 2015) (“[I]t is important to remember that an ‘institutional investor’ like a pension fund or mutual fund is essentially just a pool of retail investors. Thus, by giving the buy-side analyst better tools to research the market, the SEC would benefit all the investors in the pool.”).

[9] See Bauguess, supra note 6 (“Many retail investors obtain information they use to inform investment decisions through online, investment-related web sites, like Yahoo!Finance and commercial trading sites. The availability of structured data, which can be used by these commercial tools, means that retail investors will have faster access to data about more companies or investment funds, with reported information potentially at a more granular level.”).

[11] See FAST Act Modernization and Simplification of Regulation S-K, Final Rule, Rel. No. 33-10618 (Mar. 20, 2019).

[12] See, e.g., Update of Statistical Disclosures for Bank and Savings and Loan Registrants, Final Rule, Rel. No. 33-10835 (Sept. 11, 2020) (declining to require disclosure in a machine-readable format and noting “we are cognizant of the additional costs that would be incurred”).

[14] See Reporting of Proxy Votes on Executive Compensation and Other Matters, Proposed Rule, Rel. No. 34-63123 (Oct. 18, 2010) (“Section 951 of the Dodd-Frank Wall Street Reform and Consumer Protection Act  (‘Dodd-Frank Act’), enacted on July 21, 2010, added new Section 14A to the Exchange  Act. Section 14A requires issuers to provide shareholders with a vote on certain executive compensation matters, and it requires certain institutional investment managers to report how they voted on those matters.”).

[15] See id.

[16] See Bank of America/Merrill Lynch, ESG Part II: a deeper dive, Equity Strategy Focus Point (June 15, 2017) (“Prior to our work on ESG, we found scant evidence of fundamental measures reliably predicting earnings quality. If anything, high quality stocks based on measures like Return on Equity (ROE) or earnings stability tended to deteriorate in quality, and low quality stocks tended to improve just on the principle of mean reversion. But ESG appears to isolate non-fundamental attributes that have real earnings impact: these attributes have been a better signal of future earnings volatility than any other measure we have found.”).

[17] See, e.g., Commissioner Allison Herren Lee, Playing the Long Game: The Intersection of Climate Change Risk and Financial Regulation (Nov. 6, 2020); Commissioner Allison Herren Lee, Regulation S-K and ESG Disclosures: An Unsustainable Silence (Aug. 26, 2020).

[19] See Inline Adopting Release, supra note 5, at 25-27 (declining to require auditor assurance and providing that “we are not requiring additional transparency regarding auditors’ responsibilities related to financial statement information XBRL data at this time”).

[20] See Calcbench, The Quality of XBRL Filings (2014) (“[T]here are a non-trivial number of errors in tags which are likely to be heavily used by analysts and investors (e.g., Revenues, Net Income, Assets, etc.)”); see also XBRL US, Aggregated Real-time Filing Errors.

[21] See, e.g., Ariel Markelevich, The Quality and Usability of US SEC XBRL Filings (June 21, 2016) (“Findings suggested that starting in 2012, there has been a steady improvement in the quality and usability of the XBRL filings in most aspects. Additionally, it seems that the lower quality and usability originates in data in the notes to the financial statements and in data filed by smaller companies. The results presented in the paper are consistent with the notion of companies moving along a learning curve and improving the quality and usability of the XBRL data as they gain more experience tagging. These improvements make it easier to use the XBRL filings and reap the benefits offered by this data. However, in spite of the efforts and improvements, it seems like more work is needed to continue improving the quality of the data.”).

[23] See XBRL, LEI Taxonomy finalized (July 3, 2020).

[24] See, e.g., Tailored Shareholder Reports, Treatment of Annual Prospectus Updates for Existing Investors, and Improved Fee and Risk Disclosure for Mutual Funds and Exchange-Traded Funds; Fee Information in Investment Company Advertisements, Proposed Rule, Rel. No. 33-10814 (Aug. 5, 2020) (“Should we permit funds to provide, in lieu of a CUSIP number, other identifiers such as a Financial Instrument Global Identifier (FIGI) for each security?”).

Return to Top