Quick Links

EntityEngine: Entity-Centric Querying and Exploration of Web Data

Overview

    The objective of this project is to produce general methods for expressive and efficient support of structured query and exploration of the entity-centric Web. The continuous evolution of the Web has made itself the primary knowledge source for many people. It has become an information repository full of entities (material or virtual) and descriptions of their properties and relationships. This repository presents great potential for enabling advanced information systems and applications to meet users’ information needs on Web data. In discovering and exploring the entities that fascinate them, users are in need of structured querying and exploration facilities that explicitly deal with the entities, their properties, and relationships. That is, we need to take an entity-centric structure-focused view, apart from the page-centric text-focused view of generic Web pages. Entity-rich Web data is different from both database tables and unstructured text. The lack of predefined schemata and controlled vocabulary significantly limits the applicability of current information extraction-based approaches.

    We have developed two prototype systems---ERQ and Facetedpedia.

    • ERQ: Entity-Relationship Query: We propose a structured query mechanism, entity-relationship query, for searching entities in the Wikipedia corpus by their properties and interrelationships. An entity-relationship query consists of multiple predicates on desired entities. The semantics of each predicate is specified with keywords. Entity-relationship query searches entities directly over text instead of pre-extracted structured data stores. This characteristic brings two benefits: (1) Query semantics can be intuitively expressed by keywords; (2) It only requires rudimentary entity annotation, which is simpler than explicitly extracting and reasoning about complex semantic information before query-time. We present a ranking framework for general entity-relationship queries and a position-based Bounded Cumulative Model (BCM) for accurate ranking of query answers. We also explore various weighting schemes for further improving the accuracy of BCM. We test our ideas on a 2008 version of Wikipedia using a collection of 45 queries pooled from INEX entity ranking track and our own crafted queries. Experiments show that the ranking and weighting schemes are both effective, particularly on multi-predicate queries.
    • Facetedpedia: Dynamic and Query-Dependent Faceted Interface for Wikipedia: We investigated methods for dynamically discovering query-dependent faceted interfaces over text documents. Given a set of result documents from a keyword search query, the objective is to produce a faceted interface for exploring the result documents. Different from previous approaches, we aim at developing methods that are fully automatic and dynamic in both facet dimension generation and category hierarchy construction. Toward this goal, we propose a general faceted search model for document exploration. This model is instantiated into two prototype systems, Facetedpedia and Facetednews, for exploring Wikipedia articles and news articles, respectively. Our model utilizes the collaborative vocabularies in Wikipedia, such as its category hierarchy and intensive internal hyperlinks, for building faceted interfaces. Given the sheer size and complexity of Wikipedia data, the search space of possible choices of faceted interfaces is prohibitively large. We proposed metrics for ranking individual facet hierarchies by user navigational cost, and metrics for ranking interfaces (each with k facet hierarchies) by both average pair-wise facet similarities and average navigational costs. We thus developed faceted interface discovery algorithms that optimize for these ranking metrics. Our experimental evaluation and user studies verified the effectiveness of the proposed metrics, the algorithms, and the prototype systems.

People

Publications

  • Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, Qing He, and Zhongzhi Shi. Discovering and Learning Sensational Episodes of News Events. In Proceedings of the 23rd International World Wide Web Conference (WWW), pages 217-218, Seoul, Korea, April 2014. (poster paper, acceptance rate 110/226=48.7%) PDF
  • Peng Jiang, Huiman Hou, Lijiang Chen, Shimin Chen, Conglei Yao, Chengkai Li, and Min Wang. Wiki3C: Exploiting Wikipedia for Context-aware Concept Categorization. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM), pages 345-354, Rome, Italy, February 2013. (acceptance rate 73/387=19%) PDF
  • Afroza Sultana, Quazi Hasan, Ashis Biswas, Soumyava Das, Habibur Rahman, Chris Ding, and Chengkai Li. Infobox Suggestion for Wikipedia Entities. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM), pages 2307-2310, Maui, Hawaii, October 2012. (poster paper, acceptance rate 106/228=46.5%) PDF
  • Xiaonan Li, Chengkai Li, and Cong Yu. Entity-Relationship Queries over Wikipedia. In ACM Transactions on Intelligent Systems and Technology (TIST), vol. 3, no. 4, article 70:1-20, September 2012. PDF ERQ online demo
  • Xiaonan Li, Chengkai Li, and Cong Yu. Entity-Relationship Queries over Wikipedia. In Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents (SMUC), pages 21-28, Toronto, Canada, October 2010. (Co-located with CIKM 2010) (acceptance rate 8/32=25%) PDF slides Online demo
  • Ning Yan, Chengkai Li, Senjuti B. Roy, Rakesh Ramegowda, and Gautam Das. Facetedpedia: Enabling Query-Dependent Faceted Search for Wtalks/2010/erq-smuc10-lly-oct10.pdfdings of the 19th ACM International Conference on Information and Knowledge Management (CIKM), pages 1927-1928, Toronto, Canada, October 2010. demonstration description. PDF
  • Xiaonan Li, Chengkai Li, and Cong Yu. EntityEngine: Answering Entity-Relationship Queries using Shallow Semantics. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM), pages 1925-1926, Toronto, Canada, October 2010. demonstration description. PDF ERQ online demo
  • Chengkai Li, Ning Yan, Senjuti Basu Roy, Lekhendro Lisham, and Gautam Das. Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia. In Proceedings of the 19th International World Wide Web Conference (WWW), pages 651-660, Raleigh, North Carolina, USA, April 2010. (acceptance rate 104/743=14%) PDF slides
  • Xiaonan Li, Chengkai Li, and Cong Yu. Structured Querying of Annotation-Rich Web Text with Shallow Semantics. Technical Report, Department of Computer Science and Engineering, University of Texas at Arlington, March 2010. PDF

Funding

  • National Natural Science Foundation of China Grant 61370019. Research on Crowdsourcing Entity Linkage for the Semantic Web. Wei Hu (PI), Chengkai Li (co-PI), Wenyang Bai (co-PI), Gong Cheng (co-PI). RMB780,000. Jan. 2014-Dec. 2017.
  • 2011 and 2012 HP Labs Innovation Research Awards, Entity-Centric Querying of Enterprise Information for IT Management. Chengkai Li (PI). $80,000.
  • UT Arlington Research Enhancement Program (REP): Mashing Up Information on the Web. Chengkai Li (PI). $10,000. Sept. 2008-Aug. 2009.

Disclaimer

    This material is based upon work partially supported by the National Science Foundation Grants 1018865 and 1117369, 2011 and 2012 HP Labs Innovation Research Awards, the National Natural Science Foundation of China Grant 61370019, and a UTA REP award. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.

500 UTA Boulevard
Engineering Research Building (ERB), Room 414
Arlington, TX 76019-0015

Email: cli@uta.edu

© The University of Texas at Arlington 2007-2019. All rights reserved.