October 1, 2008
I am an accounting professor at the University of Miami, and conduct empirical research in the areas of financial reporting and auditing. My comments on the current system and suggestions for the future system should be fairly representative of accounting and finance researchers that use SEC filings as their primary data source.
Academic Use of the Edgar System
Accounting and finance researchers increasingly obtain and analyze SEC filings as follows:
1. Filings are identified from the company.idx file on the SEC's FTP site. From this, all filings of a particular type are downloaded from the FTP site. For example, for recent research project, I was interested in the contents of firms audit opinions and opinions on internal controls. I downloaded all 10-K filings using Perl (a programming language).
2. Relevant information is then extracted from the filings and analyzed using Perl. In the case of the audit opinions, the opinions were extracted from the 10-k filings, using a variety of search algorithms in Perl, and then stored in a database. The opinions were further analyzed and coded electronically. Over 30, 000 audit opinions were analyzed for this project.
Having access to this source data has been tremendously helpful for me and for many other academic researchers. I have taught courses at London Business School, Michigan State University, Penn State University, Purdue University, and University of Tennessee, on how to obtain and analyze data from Edgar using the Perl programming language (for more details go to my website http://sbaleone.bus.miami.edu), and there is strong interest in these methods.
Limitations of the existing system
As has been recognized by the SEC and others, the SGML file format is less "user friendly" than, say, XBRL. The challenge with the existing format is that somewhat advanced search tools are required to find information within the files because there is very limited tagging. Even using these tools it is sometimes impossible to certain items without a more find more comprehensive tagging system.
Needs for the new system:
1. Large scale access - As described above, academic researchers conduct large-sample studies and it is vitally important there we continue to have methods for large-scale access moving forward. Currently we can simply download all the filings, via FTP, with a program and then loop through the filings to extract the data we need from tens of thousands of filings. My biggest fear is that the new system will be database oriented and will only allow access to the data via a web-based GUI (Graphical User Interface). This would be a huge setback for academic researchers. Imagine trying to obtain the audit opinions of 10,000 firms and then putting those opinions, along with company name, CIK number, etc., into a database one at a time. The new system either needs to have an FTP site containing filings (similar to the current one), or there needs to be direct read access to a database that will allow researchers to run SQL queries directly to extract the data they need.
2. Tagging - Obviously, XBRL or some type of tagging would greatly simplify the process of identifying and extracting needed data.
Andrew J. Leone
Professor of Accounting
School of Business
University of Miami
Coral Gables, FL