Entity-Relationship Query (ERQ) is an entity-centric, structured query mechanism. You can query named entities by telling what kinds of entities you want and what are the relations among them. The current demo supports 0.75 million entities collected from Wikipedia. These entities are organized into 10 types, namely Person, Company, University, City, Novel, Player, Club, Song, Film and Award. The 2008-07-24 snapshot of Wikipedia is used as the corpus.
- Find who won World Chess Champion titles.
- Find Catalan universities.
- Find German cities that were in the hanseatic league.
- Find companies and their founders, where the companies are in Silicon Valley and founders are Stanford graduates.
- Find an Australian actress, an Academy Award winning film, and a Grammy Award winning song, where the actress stars the film and the song is the theme of that film.
Four Types of Queries in ERQ
- SS: Single-Predicate Query (Selection Predicate)
- SJ: Single-Predicate Query (Join Predicate)
- MS: Multi-Predicate Query (without Join Predicate)
- MJ: Multi-Predicate Query (with Join Predicate)
Brief Comparison with Other Approaches/Systems
- The DB-based approach pre-extracts structured information from text into databases to support SQL queries on entities. Systems taking this approach (e.g. ExDB) are restricted by the capability of information extraction (IE) and natural language processing (NLP) technologies. Facts that are not extracted are lost and thus cannot be queried in database. ERQ circumvents the problem by directly querying over texts, rather than extracted facts. Detailed comparison between ERQ and the state-of-the-art IE system, TextRunner is provided for INEX17 query set.
- The Semantic Web approach encodes entity-relationship information, as well as general knowledge, in RDF format, and enables the expressive SPARQL queries coupled with reasoning power, which is lacking in ERQ. However, currently RDF data are either manually collected or automatically extracted using IE/NLP. The former limits system scalability while the latter suffers from similar problems in DB-based approach.
- IR-based approach is exemplified by INEX Entity Ranking track. It focuses on finding entities according to narrative descriptions, where the descriptions are often treated as term vectors as in traditional IR systems. INEX queries do not embody the notion of predicate and thus unstructured. The newly formed Entity track in TREC is similar to INEX in this sense.
- ERQ takes the DB+IR approach. On the one hand, ERQ supports SQL-like structured queries, consisting of multiple predicates. On the other, the predicates are defined by keywords as in IR queries. We acknowledge that, the effectiveness of our approach partially relies on the user's capability in providing proper keyword constraints, just like in IR queries. Some related works (e.g., EntityRank) is similar to ERQ in the sense that their queries are composed with keywords (as single-predicate queries in ERQ, not as narrative descriptions in INEX or TREC). However, those systems do not support multi-predicate queries. Besides, they only focus on precisions at top-few ranks, while in ERQ we attempt to maintain good precisions in longer range.