Entity-Relationship Queries
  1. Introduction
  2. Guideline
  3. INEX Queries
  4. OWN Queries

Entity-Relationship Query (ERQ) is an entity-centric, structured query mechanism. You can query named entities by telling what kinds of entities you want and what are the relations among them. The current demo supports 0.75 million entities collected from Wikipedia. These entities are organized into 10 types, namely Person, Company, University, City, Novel, Player, Club, Song, Film and Award. The 2008-07-24 snapshot of Wikipedia is used as the corpus.

Quick Examples:

Four Types of Queries in ERQ

  1. SS: Single-Predicate Query (Selection Predicate)
  2. SJ: Single-Predicate Query (Join Predicate)
  3. MS: Multi-Predicate Query (without Join Predicate)
  4. MJ: Multi-Predicate Query (with Join Predicate)
The INEX17 query set (converted from INEX topics) only contains SS and MS queries. OWN28 contains SS, MS and MJ queries. We don't design SJ queries in OWN28 because such queries are not as common as the other three types. A short guide on how to compose your queries can be found here.

Brief Comparison with Other Approaches/Systems

  • The DB-based approach pre-extracts structured information from text into databases to support SQL queries on entities. Systems taking this approach (e.g. ExDB) are restricted by the capability of information extraction (IE) and natural language processing (NLP) technologies. Facts that are not extracted are lost and thus cannot be queried in database. ERQ circumvents the problem by directly querying over texts, rather than extracted facts. Detailed comparison between ERQ and the state-of-the-art IE system, TextRunner is provided for INEX17 query set.
  • The Semantic Web approach encodes entity-relationship information, as well as general knowledge, in RDF format, and enables the expressive SPARQL queries coupled with reasoning power, which is lacking in ERQ. However, currently RDF data are either manually collected or automatically extracted using IE/NLP. The former limits system scalability while the latter suffers from similar problems in DB-based approach.
  • IR-based approach is exemplified by INEX Entity Ranking track. It focuses on finding entities according to narrative descriptions, where the descriptions are often treated as term vectors as in traditional IR systems. INEX queries do not embody the notion of predicate and thus unstructured. The newly formed Entity track in TREC is similar to INEX in this sense.
  • ERQ takes the DB+IR approach. On the one hand, ERQ supports SQL-like structured queries, consisting of multiple predicates. On the other, the predicates are defined by keywords as in IR queries. We acknowledge that, the effectiveness of our approach partially relies on the user's capability in providing proper keyword constraints, just like in IR queries. Some related works (e.g., EntityRank) is similar to ERQ in the sense that their queries are composed with keywords (as single-predicate queries in ERQ, not as narrative descriptions in INEX or TREC). However, those systems do not support multi-predicate queries. Besides, they only focus on precisions at top-few ranks, while in ERQ we attempt to maintain good precisions in longer range.
Entity TypePredicates

Entity x:  
Entity y:  
Entity z:  

Contact: Chengkai Li
The Innovative Database and Information Systems Research Lab
Room 237, GeoScience Building, University of Texas at Arlington