CS2A New Database Synopsis for Query Estimation
Document Type
Conference Proceeding
Publication Date
7-29-2013
Abstract
Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. In this research, we propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget.
DOI
10.1145/2463676.2463701
Montclair State University Digital Commons Citation
Yu, Feng; Hou, Wen Chi; Luo, Cheng; Che, Dunren; and Zhu, Michelle, "CS2A New Database Synopsis for Query Estimation" (2013). Department of Computer Science Faculty Scholarship and Creative Works. 199.
https://digitalcommons.montclair.edu/compusci-facpubs/199