Quick Links

DEMANDATA: Computational Journalism Research

Overview

    Claims of "fact" are constantly made from data--by journalists, politicians, lobbyists, public relations specialists, sports fans, etc. Wherever numbers and data are involved, they can be laden with "lies, d--ed lies, and statistics." Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning the quality of the resulting claims, or to formulating good queries from the outset. This project fills this void, by seeking to advance the understanding of what makes for a high-quality claim based on data, and how to find queries that lead there.

    This project plans to develop a systematic solution for data-driven "fact-checking" (assessing quality of claims) and "lead-finding" (identifying claims or objects for further investigation), by addressing modeling, algorithmic, and systems challenges.

    There is demand for this research in many domains where decisions are increasingly driven by data, e.g., public policy, business intelligence, homeland security, and health care. Nonetheless, this project chooses public interest journalism as its target domain, because it is one area where resources are severely strained and innovation is pressingly needed. In the past, traditional news organizations have provided public interest reporting to hold governments, corporations, and powerful individuals accountable. The decline of traditional media in recent years, however, has led to dwindling support for this vitally important type of journalism. Data-driven fact-checking and lead-finding are growing in importance, as more data become publicly available in the movement of "democratizing data." Taking advantage of data availability, this project hopes to reduce cost, increase effectiveness, and broaden participation for public interest journalism, by putting practical tools in the hands of journalists and citizens alike. Such tools help promote transparency in reporting, boost investigative activities for the underserved, and educate the public in data and quantitative analysis. In sum, this project represents a step towards "democratizing data analysis," complementing the current movement of democratizing data.

People

Publications

  • Naeemul Hassan, Chengkai Li, and Mark Tremayne. Detecting Check-worthy Factual Claims in Presidential Debates. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM), pages 1835-1838, Melbourne, Australia, October 2015. (KM track short Paper, full paper acceptance rate 87/484=18.0%, short paper acceptance rate 36/484=7.4%) PDF ClaimBuster online demo
  • Naeemul Hassan, Bill Adair, James Hamilton, Chengkai Li, Mark Tremayne, Jun Yang and Cong Yu. The Quest to Automate Fact-Checking. In Proceedings of the 2015 Computation+Journalism Symposium, 5 pages, New York City, USA, October 2015.
    PDF ClaimBuster online demo
  • Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, and Qing He. Online Frequent Episode Mining. In Proceedings of the 31st International Conference on Data Engineering (ICDE), pages -, Seoul, Korea, April 2015. (acceptance rate /=%) PDF
  • Brett Walenz, You (Will) Wu, Seokhyun (Alex) Song, Emre Sonmez, Eric Wu, Kevin Wu, Pankaj K. Agarwal, Jun Yang, Naeemul Hassan, Afroza Sultana, Gensheng Zhang, Chengkai Li, Cong Yu. Finding, Monitoring, and Checking Claims Computationally Based on Structured Data. In Proceedings of the 2014 Computation+Journalism Symposium, pages -, New York City, USA, October 2014. PDF
  • Naeemul Hassan, Afroza Sultana, You Wu, Gensheng Zhang, Chengkai Li, Jun Yang, and Cong Yu. Data In, Fact Out: Automated Monitoring of Facts by FactWatcher. In Proceedings of the VLDB Endowment (PVLDB), pages 1557-1560, 2014. demonstration description. (acceptance rate 42/115=36.5%) (excellent demonstration award)
    PDF FactWatcher online demo
  • You Wu, Brett Walenz, Peggy Li, Andrew Shim, Emre Sonmez, Pankaj Agarwal, Chengkai Li, Jun Yang, and Cong Yu. iCheck: Computationally Combating "Lies, D---ned Lies, and Statistics". In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 1063-1066, Snowbird, Utah, USA, June 2014. demonstration description. (acceptance rate 29/76=38%) PDF
  • Gensheng Zhang, Xiao Jiang, Ping Luo, Min Wang, and Chengkai Li. Discovering General Prominent Streaks in Sequence Data. In ACM Transactions on Knowledge Discovery from Data (TKDD), 8(2):article 9, June 2014.
    PDF FactWatcher online demo
  • Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, Qing He, and Zhongzhi Shi. Discovering and Learning Sensational Episodes of News Events. In Proceedings of the 23rd International World Wide Web Conference (WWW), pages 217-218, Seoul, Korea, April 2014. (poster paper, acceptance rate 110/226=48.7%) PDF
  • You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. Toward Computational Fact-Checking. In Proceedings of the VLDB Endowment (PVLDB), 7(7):589-600, March 2014. PDF
  • Afroza Sultana, Naeemul Hassan, Chengkai Li, Jun Yang, and Cong Yu. Incremental Discovery of Prominent Situational Facts. In Proceedings of the 30th International Conference on Data Engineering (ICDE), pages 112-123, Chicago, Illinois, USA, March 2014. (acceptance rate 89/446=20%) PDF slides FactWatcher online demo
  • You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. On "One of the talks/2014/factmonit​oring-icde​14-talk.pdfngs of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 1487-1495, Beijing, China, August 2012. (acceptance rate 133/755=17.6%) PDF
  • Xiao Jiang, Chengkai Li, Ping Luo, Min Wang, and Yong Yu. Prominent Streak Discovery in Sequence Data. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 1280-1288, San Diego, California, USA, August 2011. (full paper, poster presentation, acceptance rate 125/714=17.5%) PDF FactWatcher online demo
  • Sarah Cohen, Chengkai Li, Jun Yang, and Cong Yu. Computational Journalism: A Call to Arms to Database Researchers. In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR), pages 148-151, Asilomar, California, USA, January 2011. (3rd place in best Outrageous Ideas and Vision (OIV) Track paper competition) PDF

Funding

  • Knight Prototype Fund, ClaimBuster. Chengkai Li (PI), Naeemul Hassan, Mark Tremayne, Bill Adair, Jun Yang, Cong Yu. $35,000. Nov. 2015-Apr. 2016.
  • NSF Award #1565699, I-Corps Team: ClaimBuster: Automated, Live Fact-Checking. Chengkai Li (PI). $50,000. Nov. 2015-Apr. 2016.
  • NSF Award #1408928, III: Medium: Collaborative Research: From Answering Questions to Questioning Answers (and Questions)---Perturbation Analysis of Database Queries. Chengkai Li (PI). $241,778. Sept. 2014-Aug. 2017. (Collaborative grant with Duke University (Jun Yang (PI), Bill Adair (Co-PI), Pankaj K. Agarwal (Co-PI)) and Stanford University (James T. Hamilton (PI)). The lead institute is Duke. The grant is totaled at $1.2 million.)
  • National Natural Science Foundation of China Grant 61370019. Research on Crowdsourcing Entity Linkage for the Semantic Web. Wei Hu (PI), Chengkai Li (co-PI), Wenyang Bai (co-PI), Gong Cheng (co-PI). RMB780,000. Jan. 2014-Dec. 2017.
  • 2011 and 2012 HP Labs Innovation Research Awards, Entity-Centric Querying of Enterprise Information for IT Management. Chengkai Li (PI). $80,000.

Disclaimer




500 UTA Boulevard
Engineering Research Building (ERB), Room 414
Arlington, TX 76019-0015

Tel: 817-272-0162
Email: cli@uta.edu