ACM SIGMOD 2017 Most Reproducible Paper Awardreproducibility package
There is an unprecedented proliferation of entity graphs in many computing domains. In an entity graph, nodes represent entities (e.g., persons and organizations) and edges represent their relationships. Real-world instances of such graphs include knowledge bases, social networks, and biomedical databases, to name just a few. Users and developers are tapping into entity graphs for numerous applications. Unfortunately, they are very difficult to access. This largely has to do with the sheer size and heterogeneity of such data, where thousands of different types of entities and attributes co-exist. Users are often overwhelmed by the daunting task of attempting to digest and understand their schema and data specification correctly. The development of various graph database systems in recent years mainly focused on storage, indexing, and distributed processing, but not usability. In retrieving data from these databases, the norm is to use either structured query languages such as SQL and SPARQL, or keyword queries. However, writing structured queries that comply with database specification is extremely hard for ordinary users, while keyword queries might generate too many ambiguous answers.
The objective of this research is to tackle the challenges in building user-friendly query systems for graph data. A graph database system with good usability needs query mechanisms more intuitive than structured queries and more powerful than keyword queries. In the project we will develop innovative ways to query heterogeneous graphs and the research will advance the state-of-the-art of database techniques in accessing and managing big graphs. Due to the very different data and query models, the research is a departure from prior study on database usability that mainly focuses on relational databases, in both approaches taken and techniques proposed. With regard to querying graph databases, there is no systematic study of usability issues. The research fills this gap.
As an initial step toward improving the usability of knowledge graphs, we propose to query such data by example entity tuples, without requiring users to write complex graph queries. Our system, GQBE (Graph Query By Example), shows the possibility of this querying paradigm working in practice. The proposed framework automatically derives a hidden query graph based on input query tuples and finds approximate matching answer graphs to obtain a ranked list of top-k answer tuples. We conducted experiments on the real-world Freebase dataset, and observed appealing accuracy and efficiency. Our proposal of querying by example tuples provides a complementary approach to the existing keyword-based and query-graph-based methods, facilitating user-friendly graph querying. To the best of our knowledge, GQBE is among the first few emerging systems to query knowledge graphs by example entity tuples.
This material is based upon work partially supported by the National Science Foundation Grants 1018865 and 1117369, 2011 and 2012 HP Labs Innovation Research Awards, and the National Natural Science Foundation of China Grant 61370019. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.