Quick Links

RankSQL: Database Engine Extended with Top-k Query Algorithms and Query Optimization

Overview

    The ubiquitous usage of databases for managing structured data, compounded with the expanded reach of the Internet to end users, has brought forward new scenarios of data retrieval. Users often want to express non-traditional fuzzy queries with soft criteria, in contrast to Boolean queries, and to explore what choices are available in databases and how they match the query criteria. Conventional database management systems (DBMSs) have become increasingly inadequate for such new scenarios.

    Towards enabling data retrieval, we first studied how to fundamentally integrate ranking into databases. We built RankSQL, a DBMS that provides systematic and principled support of ranking queries. With a new ranking algebra and an extended query optimizer for the algebra, RankSQL captures ranking as a first-class construct in databases, together with traditional Boolean constructs. We invented efficient techniques for answering ad-hoc ranking aggregate queries. RankSQL provides significant performance improvement over current DBMSs in processing ranking queries and ranking aggregate queries.

    We further studied how to enable retrieval mechanisms beyond just ranking. Our explorative study in this direction is exemplified by two novel proposals– One is to integrate clustering and ranking of database query results; the other is to support inverse ranking queries that provide ranks of objects in query context. Injecting such non-traditional facilities into databases presents non-trivial challenges in both defining query semantics and designing query processing methods. We extended SQL language to express such queries and invented partition- and summary-driven approaches to process them.

System source code

    The source code of RankSQL is provided as a patch file on PostgreSQL 7.4.3. You need the source code of PostgreSQL 7.4.3 and the "applypatch" program, in order to apply the patch file on PostgreSQL 7.4.3. The source code was developed multiple years ago and is provided as is. Please try your best to make it work. If you do have questions, please send them my way (cli [AT] uta [DOT] edu). I may be able to address them.

Publications

  • Chengkai Li. On Contextual Ranking Queries in Databases . In Information Systems, Volume 38, Issue 4, Pages 509–523, June 2013. PDF
  • Chengkai Li. Enabling Data Retrieval: By Ranking and Beyond . Ph.D. Dissertation. University of Illinois at Urbana-Champaign, 2007. PDF
  • Chengkai Li, Min Wang, Lipyeow Lim, Haixun Wang, and Kevin Chen-Chuan Chang. Supporting Ranking and Clustering as Generalized Order-By and Group-By . In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 127-138, Beijing, China, June 2007. (acceptance rate 69/480=14%) PDF slides
  • Chengkai Li, Kevin Chen-Chuan Chang, and Ihab F. Ilyas. Supporting Ad-hoc Ranking Aggregates. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 61-72, Chicago, Illinois, USA, June 2006. (acceptance rate 58/446=13%) PDF slides
  • Chengkai Li, Mohamed Ali, Kevin Chen-Chuan Chang, and Ihab F. Ilyas. RankSQL: Supporting Ranking Queries in Relational Database Management Systems . In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), pages 1342-1345, Trondheim, Norway, August 2005. demonstration description. (acceptance rate 29/69=42%) PDF
  • Chengkai Li, Kevin Chen-Chuan Chang, Ihab F. Ilyas, and Sumin Song. RankSQL: Query Algebra and Optimization for Relational Top-k Queries. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 131-142, Baltimore, Maryland, USA, June 2005. (acceptance rate 65/431=15%) PDF slides



500 UTA Boulevard
Engineering Research Building (ERB), Room 414
Arlington, TX 76019-0015

Email: cli@uta.edu

© The University of Texas at Arlington 2007-2019. All rights reserved.