Skip to main content

Improving Disclosure with Smart Data

Rick A. Fleming, Investor Advocate

XBRL US Investor Forum 2016: Finding Value with Smart Data<br>New York, New York

Oct. 24, 2016

Thank you, Campbell [Pryde], for that kind introduction and for inviting me to participate in this forum today.

For more than 80 years, our securities laws have required sellers of securities to disclose all material facts to prospective investors, and this disclosure requirement is a cornerstone of fair and efficient markets. However, our disclosure delivery methods have not kept pace with changes in technology, and there is much that can be done to improve the delivery of information into the modern marketplace. This forum brings together many thought leaders to continue that conversation, and I look forward to hearing your ideas. Having said that, I must remind you that the views I express are my own and do not necessarily reflect those of the Commission, the Commissioners, or Commission staff.

Some of my SEC colleagues will be speaking later today, so I won’t belabor their points, but in certain areas the Commission is already making great strides to employ data analytics to accomplish its mission of protecting investors, fostering capital formation, and promoting fair, orderly and efficient markets. For example, the Commission’s internally-developed Corporate Issuer Risk Assessment tool aggregates and organizes XBRL-tagged financial data filed by issuers. The Division of Enforcement uses this tool to detect anomalous patterns in financial statements that may warrant further inquiry.[1] Staff throughout the Commission utilize another internally-developed tool, the Financial Statement Query Viewer, to search financial statement data and footnotes across different periods and companies. In addition, the Commission has a text analytics initiative in which staff can identify inconsistencies in narrative disclosures, discrepancies between narrative and numeric disclosures, narrative trending, changes in risk profiles based on sentiment, and other interesting phenomena. These types of tools enable the staff to discern norms, outliers, and patterns in ever larger quantities of information.

As you can imagine, investors can benefit from having similar tools of this nature. Data analytics is useful not only for regulatory purposes, but also for helping investors determine whether a security is a good investment. Thus, it is encouraging to see that the Commission has made progress in utilizing structured data to enhance the disclosure of information to investors. The Commission now requires regulated entities to make a wide range of filings in structured data formats.

However, despite these strides to make better use of technology, there is more that can be done. To tick off a few wish list items—

  • I’d like the SEC to embrace the Legal Entity Identifier with the goal of making public company disclosure to the SEC interoperable with disclosure to other reporting regimes, as recommended by the Data Coalition and XBRL US;
  • I’d like the SEC to require block-tagging of narrative text disclosures; and
  • I’d like the SEC to require detail-tagging within narrative text disclosures.

Within this room, I suspect there is broad consensus that these types of reforms should have been adopted long ago. However, rulemaking is not a simple process. Whenever changes are proposed to securities law disclosure requirements, a common dynamic plays out. Investors tend to favor as much disclosure as possible, while corporate issuers and preparers warn of the burdens and costs of providing the disclosure. The Commissioners must then weigh the competing interests and decide which policies to adopt.

To me, this is one area where technology provides the opportunity for a win-win, because investors and companies can both benefit from greater utilization of structured data. However, the benefits are sometimes indirect, so it may take time before market participants and policymakers can see that the benefits ultimately will justify the costs.

Let’s start by considering the benefits to investors from having expanded access to disclosures in the form of structured data.

Currently, the reports that companies file with the SEC are document-based reports. In particular, the 10-K can be voluminous, with much of the information disclosed in an unstructured format.[2] Searching the unstructured portions is cumbersome—you might thumb through pages of a print-out, or do a ‘Control-F’ word search online. It may be necessary to manually locate and print multiple reports in order to find material that is incorporated by reference, to review prior periods of the same company, or to compare the performance of multiple businesses.

Structured data enables an investor to use enhanced search capabilities. When disclosure is organized and tagged with definitional information, or metadata, software can be written to locate and retrieve what someone is looking for with precision. A search engine that can be trained on a block-tagged description of the company’s business will retrieve results that are more relevant and useful than a search of a larger, unstructured report that lacks metadata sign-posting.

Block tags also can make searching more productive by enabling retrieval of information on a disclosure topic, even when the disclosure itself lacks familiar keywords. Suppose an analyst wants information on an accounting policy. A company might discuss that accounting policy without ever using the analyst’s specific keywords. Block tags can point the search engine to the portion of the filing where the answer to the analyst’s question is likely to be found.

In addition, if everything in a 10-K were tagged appropriately, investors would be able to generate reports that are personally customized. Suppose someone wants all of the information available on a company in an industry with which she’s unfamiliar. Someone else who’s familiar with the company wants only the latest year-end financial results. Structured data enables us to present investors with a menu of available information so that they each can retrieve only that which they wish to see. It also enables creative infographics with instantaneous updates, using data from multiple sources.

Service providers are already demonstrating the power of structured data for investors. Investors now have access to platforms for analyzing financial statements using as-reported company data piped in from the EDGAR XBRL data set.[3] These platforms include data points that other data aggregators historically haven’t provided.[4] They also include data for smaller companies that other data aggregators historically haven’t covered. These enhancements stem from the SEC’s XBRL tagging requirements.

Other service providers allow investors to combine EDGAR XBRL data with publicly available information from other sources to enable cross-referencing. Other sources could include drug patent expiration data from the Food and Drug Administration and consumer complaint data from the Consumer Financial Protection Bureau.[5] This type of cross-referencing allows investors to more easily consider how consumer sentiment might impact earnings, or how a security breach might impact revenue. These insights previously required time-consuming research, but now they are readily attainable.

Using software that automates different types of analyses makes it easier to spot a company whose numbers “don’t add up.” My colleague Mike Willis predicts that as more people analyze as-reported company data using software that becomes more widely available, it will become harder for companies to get away with reporting deficiencies, or anomalies, or possibly even financial fraud. He likes to say, tongue-in-cheek, that empirical research has shown that the installation of closed-circuit television cameras in a store reduces theft.[6] His point is, isn’t the likely effect of smart data just as obvious?

By prioritizing structured data, and particularly the tagging of text, the Commission could drive even greater innovation in cost-effective enhancements to the packaging and delivery of information. Analytical tools could become more accessible to investors and their intermediaries, and ultimately, these tools could lead to better pricing of securities as information becomes more accessible to market participants. In this way, all investors stand to benefit from structured data, even those who have never heard of it and will never utilize it directly.

Next, let’s consider the potential benefits to corporate issuers who have to provide disclosure. Many companies consider data tagging to be an add-on cost to filing a 10-K or 10-Q. A recent survey indicated that these costs may not be as high as anticipated when XBRL was first adopted,[7] but the costs are still a factor. Companies may be focusing on the costs without appreciating the range of benefits.

In reality, the costs and benefits of reporting in a structured data format are largely related to how the company implements the tagging. Having a financial printer apply tags at the end of the report preparation process will undoubtedly result in a cost increase. However, if the company chooses to standardize data within the company’s internal systems, this will likely result in long-term cost savings.

Currently, many companies still prepare 10-Ks and 10-Qs in a manual assembly process.[8] With distinct lines of business, overseas units, and recent acquisitions, a company might have dozens of databases from which data must be gathered. People from different parts of the company will circulate and combine data, reconcile different versions, deal with formatting issues, and proofread to check that the numbers are all correct. Once gathered, the data will be cut and pasted from the various sources into word processing and spreadsheet applications used for report assembly. It will be reformatted and curated — perhaps arranged in a tabular layout or converted into an infographic. This workflow is labor-intensive and time-consuming, which means that weeks will go by before investors learn the full results of the company’s latest fiscal period.[9]

Now let’s imagine what a more efficient information system could look like.[10] A company might keep its multiple legacy databases but apply to each one an overlay of metadata labels so that computers can retrieve the data needed for reporting purposes. Standardizing data in the databases from which the data originates would allow company personnel to automate the steps of data aggregation. This would shorten the timeframe required for preparing and publishing reports, reduce errors, and allow reporting personnel to devote more of their attention to higher-level review and analysis. In addition, automation of traditionally manual audit activities — such as choosing statistical samples, detecting suspicious transactions, and testing journal entries — would enable auditors to attain their requisite assurance level faster and at a lower cost.

Efficiencies could also be achieved in the filing process that the SEC controls. For example, the reports that companies file with the SEC could simply be a set of data files that consist of financial statement data and block-tagged sections of narrative text, such as a description of the business. A company would not need to concern itself with packaging the information in a document-based report if the packaging could take place within the Commission’s software platform or in platforms developed by third parties. As a simple illustration, think of the way tax preparation software works. Taxpayers can use software that captures the relevant data without actually “filling out” the Form 1040, and the form is generated at the end of the process at the touch of a button. The same type of report generation could be used for a 10-K.

In addition to the efficiencies and cost savings that a company can achieve through structured data, an even larger benefit is possible. In my view, structured data should improve the liquidity of shares in the marketplace, particularly for smaller companies, because it enables automation of financial analyses. This makes it easier to analyze a far greater number of companies, including smaller companies, and spot a company that may be quietly outperforming (or underperforming) its competitors. Ultimately, this enhanced ability to identify profit-making opportunities should lead to more trading in those securities. In addition, if investors’ research costs go down because of automation, that might encourage them to consider smaller-sized investments, or investments in smaller companies for which the information acquisition costs may now be prohibitive.

To achieve these benefits, it is important for the Commission to be vigilant in requiring greater accuracy and comparability of the data. Toward this end, the Commission recently began allowing companies to voluntarily file structured financial statement data in a format known as Inline XBRL, which is both human-readable and machine-readable.[11] According to the Commission’s order, this format could decrease filing preparation costs, improve the quality of structured data, and, by improving data quality, increase the use of XBRL data by investors and other market participants. The rationale is that, as people view filings in the Inline XBRL format, they will see common tagging errors that are prevalent now, which will induce companies to clean up those errors. In my view, we all need to continue working together to improve the comparability and accuracy of data, and I commend the XBRL consortium for its work in developing and refining taxonomies and related guidance.

Other countries have embraced smart data with results that portend success for us. I’ll tell you about a couple of recent studies. In Belgium, private companies voluntarily provided XBRL-formatted financial statements to banks for the purpose of loan applications. That reduced informational asymmetries between the lenders and borrowers and led to faster credit decisions and lower interest rate spreads.[12] In China, the Chinese Securities Regulatory Commission mandated in 2004 that Chinese-listed firms file XBRL-formatted financial statements. That development led to a decrease in the cost of equity capital for listed companies during the post-adoption period.[13]

In Australia, the government is pursuing a national agenda premised on the idea that information can be ‘captured once, and used often’ to enhance business efficiency and productivity. [14] Companies in Australia can interact with government services through a single, secure online portal. Company personnel can select the relevant report to be completed and have the report pre-filled automatically with data from the company’s internal system software as well as information held by the government. Company personnel can review the pre-filled information, validate it, and then send it securely and directly to the relevant government agency. The information reported has been standardized amongst the various government agencies, so that the same information can be used for multiple reporting purposes.

These success stories should make us more confident about the possibilities for smart data in the United States. However, I will close by noting that while the benefits to investors are easily understood, public companies still need to be shown the potential benefits that can flow from investing resources in information standardization and structuring. On the investor side, entrepreneurs are already developing products to bring the benefits of smart data to a broader population of investors. More needs to be done to bring the benefits to companies. They remain dependent upon software developers coming up with applications that bring a tangible benefit to the bottom line.

Notwithstanding this challenge, the future of smart data looks promising. And on behalf of investors, I thank you for the commitment of all of you who have taken us this far.

Are there any questions?

[1] See Andrew Ceresney, Dir., U.S. Sec. and Exch. Comm’n Div. of Enf’t, Directors Forum 2016 Keynote Address (Jan. 25, 2016),

[2] The financial statements and footnotes of a 10-K are structured. The rest is unstructured.

[3] See, e.g., Calcbench, (last visited Oct. 20, 2016).

[4] One example is unremitted earnings of foreign subsidiaries. Another is product warranties. Net operating losses in XBRL can be disaggregated at the federal as well as the state and local levels. While data aggregators historically haven’t distinguished these, XBRL lets a user do exactly that.

[5] See Open Letter(s) to the Commissioner of the SEC, Idaciti Blog (Jun. 29, 2016), (responding to a speech by SEC Commissioner Kara M. Stein on May 6, 2016 on disclosure in the digital age).

[6] See Leighton Walter Kille & Martin Maximino, The effect of CCTV on public safety: Research roundup, Journalist’s Res. (Feb. 11, 2014),

[7] In 2015, the AICPA surveyed 14 XBRL filing agents providing XBRL tagging and filing services to 1,299 small public companies, defined as having a market capitalization of $75 million or less.

Our survey showed that 69% of the companies paid $10,000 or less on an annual basis for fully outsourced creation and filing solutions of their XBRL filings. Meanwhile, 18% of the companies paid annual costs of between $10,000 and $20,000 for their full service outsourced solutions. Only 8% of companies paid more than $25,000 in annual costs. No company’s annual cost exceeded $50,000. Through discussions with the vendors, we found that companies that paid higher annual fees did so due to complexities in their financial statements and rush charges imposed given the many last-minute changes to the fi lings (e.g., filing changes for an IPO)….

See Research Shows XBRL Filing Costs Lower than Expected, Am. Institute of CPAs (2015), As this cost estimate was for full outsourcing, it is not directly comparable to the cost estimate in the Commission’s 2009 adopting release for then-new rules requiring public companies to file financial statements and footnotes using XBRL, which contemplated internal burden hours and out-of-pocket costs for software and filing agent services. See U.S. Sec. and Exch. Comm’n, Interactive Data to Improve Financial Reporting, Release No. 33-9002 (Jan. 30, 2009),

[8] See Process Improvement: A Universal Framework for Effecting Change, Workiva (2016),; see also Disclosure Management: Streamlining the Last Mile PwC (2012),

[9] The earnings release is distributed earlier, but it is an incomplete portrayal of financial results.

[10] See Data and Technology: Transforming the Financial Information Landscape, CFA Institute (2016),

[11] U.S. Sec. and Exch. Comm’n, Order Granting Limited and Conditional Exemption Under Section 36(a) of the Securities Exchange Act of 1934 from Compliance with Interactive Data File Exhibit Requirement in Forms 6-K, 8-K, 10-Q, 10-K, 20-F and 40-F to Facilitate Inline Filing of Tagged Financial Data, Release No. 34-78041 (Jun. 13, 2016),

[12] D. Kaya et al., The benefits of structured data across the information supply chain: Initial evidence on XBRL adoption and loan contracting on private firms J. Account. Pub. Pol’y 35, 417-436 (2016) (surveying evidence that banks charge lower interest rate spreads to voluntary adopters of XBRL as compared to non-adopters, using a sample of Belgian private firms between the years of 2005-2007).

[13] S. Chen et al., How Does XBRL Affect the Cost of Equity Capital? Evidence from an Emerging Market 14 J. Int’l Acct. Res. 2, 123-145 (2015) (focusing on a sample of listed companies at the Shanghai Stock Exchange and the Shenzhen Stock Exchange in China during the years of 2005-2011).

[14] See Standard Business Reporting, (last visited Oct. 20, 2016).

Return to Top