3rd place in CIDR11 best Outrageous Ideas and Vision (OIV) Track paper competition
First Runner-Up Award in the SIGMOD17 Undergraduate Student Research Competition (Damian Jimenez)
VLDB14 Excellent demonstratione Award
Claims of "fact" are constantly made from data--by journalists, politicians, lobbyists, public relations specialists, sports fans, etc. Wherever numbers and data are involved, they can be laden with "lies, d--ed lies, and statistics." Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning the quality of the resulting claims, or to formulating good queries from the outset. This project fills this void, by seeking to advance the understanding of what makes for a high-quality claim based on data, and how to find queries that lead there.
This project plans to develop a systematic solution for data-driven "fact-checking" (assessing quality of claims) and "lead-finding" (identifying claims or objects for further investigation), by addressing modeling, algorithmic, and systems challenges.
There is demand for this research in many domains where decisions are increasingly driven by data, e.g., public policy, business intelligence, homeland security, and health care. Nonetheless, this project chooses public interest journalism as its target domain, because it is one area where resources are severely strained and innovation is pressingly needed. In the past, traditional news organizations have provided public interest reporting to hold governments, corporations, and powerful individuals accountable. The decline of traditional media in recent years, however, has led to dwindling support for this vitally important type of journalism. Data-driven fact-checking and lead-finding are growing in importance, as more data become publicly available in the movement of "democratizing data." Taking advantage of data availability, this project hopes to reduce cost, increase effectiveness, and broaden participation for public interest journalism, by putting practical tools in the hands of journalists and citizens alike. Such tools help promote transparency in reporting, boost investigative activities for the underserved, and educate the public in data and quantitative analysis. In sum, this project represents a step towards "democratizing data analysis," complementing the current movement of democratizing data.
This material is based upon work partially supported by the National Science Foundation Grants 1408928 and 1565699, a Knight Prototype Fund, 2011 and 2012 HP Labs Innovation Research Awards, and the National Natural Science Foundation of China Grant 61370019. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.