Optimizing SQL queries over text databases

被引:10
|
作者
Jain, Alpa [1 ]
Doan, AnHai [2 ]
Gravano, Luis [1 ]
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[2] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
关键词
D O I
10.1109/ICDE.2008.4497472
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text documents often embed data that is structured in nature, and we can expose this structured data using information extraction technology. By processing a text database with information extraction systems, we can materialize a variety of structured "relations," over which we can then issue regular SQL queries. A key challenge to process SQL queries in this text-based scenario is efficiency: information extraction is time-consuming, so query processing strategies should minimize the number of documents that they process. Another key challenge is result quality: in the traditional relational world, all correct execution strategies for a SQL query produce the same (correct) result; in contrast, a SQL query execution over a text database might produce answers that are not fully accurate or complete, for a number of reasons. To address these challenges, we study a family of select-project-join SQL queries over text databases, and characterize query processing strategies on their efficiency and-critically-on their result quality as well. We optimize the execution of SQL queries over text databases in a principled, cost-based manner, incorporating this tradeoff between efficiency and result quality in a user-specific fashion. Our large-scale experiments-over real data sets and multiple information extraction systems-show that our SQL query processing approach consistently picks appropriate execution strategies for the desired balance between efficiency and result quality.
引用
收藏
页码:636 / +
页数:2
相关论文
共 50 条
  • [1] SQL queries over unstructured text Databases
    Jain, Alpa
    Doan, AnHai
    Gravano, Luis
    [J]. 2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 1230 - +
  • [2] SQL queries over encrypted databases: a survey
    Sun, Bo
    Zhao, Sen
    Tian, Guohua
    [J]. CONNECTION SCIENCE, 2024, 36 (01)
  • [3] vSQL: Verifying Arbitrary SQL Queries over Dynamic Outsourced Databases
    Zhang, Yupeng
    Genkin, Daniel
    Katz, Jonathan
    Papadopoulos, Dimitrios
    Papamanthou, Charalampos
    [J]. 2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, : 863 - 880
  • [4] Correctness of SQL Queries on Databases with Nulls
    Guagliardo, Paolo
    Libkin, Leonid
    [J]. SIGMOD RECORD, 2017, 46 (03) : 5 - 16
  • [5] Populating Test Databases for Testing SQL Queries
    Suarez-Cabal, M. J.
    de la Riva, C.
    Tuya, J.
    [J]. IEEE LATIN AMERICA TRANSACTIONS, 2010, 8 (02) : 164 - 171
  • [6] OPTIMIZING JOIN QUERIES IN DISTRIBUTED DATABASES
    PRAMANIK, S
    VINEYARD, D
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1988, 14 (09) : 1319 - 1326
  • [7] OPTIMIZING JOIN QUERIES IN DISTRIBUTED DATABASES
    PRAMANIK, S
    VINEYARD, D
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1987, 287 : 282 - 304
  • [8] A Novel Secure Scheme for Supporting Complex SQL Queries over Encrypted Databases in Cloud Computing
    Liu, Guoxiu
    Yang, Geng
    Wang, Huaqun
    Xiang, Yang
    Dai, Hua
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2018,
  • [9] Optimizing SQL Queries in OT,AP Database Systems
    Myalapalli, Vamsi Krishna
    Dussa, Karthik
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (ICIP), 2015, : 833 - 838
  • [10] Expressing and optimizing similarity-based queries in SQL
    Gao, L
    Wang, M
    Wang, XS
    Padmanabhan, S
    [J]. CONCEPTUAL MODELING - ER 2004, PROCEEDINGS, 2004, 3288 : 464 - 478