Deception And Truth Analysis Datasets

Deception And Truth Analysis DATAbase
DATAbase is a structured dataset that includes Deception And Truth Analysis Scores for almost all US equities´ regulatory filings and transcripts dating back to 2008 and is updated in near real time. DATA is grounded in the 100+ year research of deception science. We use NLP to look for more than 30 known behavioral differences between deceivers and truth tellers in text-based communications.

DATAbase´s regulatory filings include 10(k)s, 10(q)s, 8(k)s, DEF14s (proxies), 20-Fs, and S-1s (IPOs). Transcripts include earnings calls, guidance updates, conference presentations, and so on. Total number of equites is greater than 5,800, including coverage of the S&P 500, 400, and 600, as well as the Russell 2000, 1000, and 3000.

DATA Scores have been demonstrated to be predictive of future stock price performance. They are 88.4% accurate in discriminating between deceptiveness and truthfulness in text-based communications, with a p-value <1.41648534942489E-11 and a Cohen´s d of 4.23. Our Type I error is 11.3%, while our Type II error is 14.3%.

Approximately 10 minutes after a new filing is made on EDGAR, D.A.T.A publish new DATA Scores for the regulatory filings.

Deception And Truth Analysis MD&A DATAbase
DATAbase MD&A is a structured dataset that includes Deception And Truth Analysis Scores for almost all US equities´ regulatory filings and transcripts dating back to 2008 and is updated in near real time. DATA is grounded in the 100+ year research of deception science. We use NLP to look for more than 30 known behavioral differences between deceivers and truth tellers in text-based communications.

MD&A Version
DATAbase MD&A dataset analysis is focussed in on the Management Discussion and Analysis (MD&A) subsection of the 10(k)s and 10(q)s that is mandated by regulation to be included in those reports. That section requires management to explain in plain language the performance for the period in question. Because the MD&A contains language that is more spontaneous than boilerplate it is believed by many Natural Language Processing investment researchers to contain better buy or sell signals.

Total number of equites is greater than 5,800, including coverage of the S&P 500, 400, and 600, as well as the Russell 2000, 1000, and 3000.

DATA Scores have been demonstrated to be predictive of future stock price performance. They are 88.4% accurate in discriminating between deceptiveness and truthfulness in text-based communications, with a p-value <1.41648534942489E-11 and a Cohen´s d of 4.23. Our Type I error is 11.3%, while our Type II error is 14.3%.

Approximately 10 minutes after a new filing is made on EDGAR, D.A.T.A publish new DATA Scores for the regulatory filings.

From receipt from D.A.T.A take less than 1 minute to score a new document.

Deception And Truth Analysis Datasets

DATAbase is a structured dataset that includes Deception And Truth Analysis Scores for all US equities´ regulatory filings and transcripts dating back to 2008 and is updated in near real time.