Securing Aggregate Queries for DNA Databases

被引:8
|
作者
Nassar, Mohamed [1 ,2 ]
Malluhi, Qutaibah [1 ,2 ]
Atallah, Mikhail [3 ]
Shikfa, Abdullatif [1 ,2 ]
机构
[1] Qatar Univ, Dept Comp Sci & Engn, Doha, Qatar
[2] Qatar Univ, KINDI Ctr Comp Res, Doha, Qatar
[3] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
DNA databases; cloud security; secure outsourcing; DATA PRIVACY;
D O I
10.1109/TCC.2017.2682860
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses the problem of sharing person-specific genomic sequences without violating the privacy of their data subjects to support large-scale biomedical research projects. The proposed method builds on the framework proposed by Kantarcioglu et al. [1] but extends the results in a number of ways. One improvement is that our scheme is deterministic, with zero probability of a wrong answer (as opposed to a low probability). We also provide a new operating point in the space-time tradeoff, by offering a scheme that is twice as fast as theirs but uses twice the storage space. This point is motivated by the fact that storage is cheaper than computation in current cloud computing pricing plans. Moreover, our encoding of the data makes it possible for us to handle a richer set of queries than exact matching between the query and each sequence of the database, including: (i) counting the number of matches between the query symbols and a sequence; (ii) logical OR matches where a query symbol is allowed to match a subset of the alphabet thereby making it possible to handle (as a special case) a "not equal to" requirement for a query symbol (e.g., "not a G"); (iii) support for the extended alphabet of nucleotide base codes that encompasses ambiguities in DNA sequences (this happens on the DNA sequence side instead of the query side); (iv) queries that specify the number of occurrences of each kind of symbol in the specified sequence positions (e.g., two 'A' and four 'C' and one 'G' and three 'T', occurring in any order in the query-specified sequence positions); (v) a threshold query whose answer is 'yes' if the number of matches exceeds a query-specified threshold (e.g., "7 or more matches out of the 15 query-specified positions"). (vi) For all query types, we can hide the answers from the decrypting server, so that only the client learns the answer. (vii) In all cases, the client deterministically learns only the query's answer, except for query type (v) where we quantify the (very small) statistical leakage to the client of the actual count.
引用
收藏
页码:827 / 837
页数:11
相关论文
共 50 条
  • [21] Similarity queries in image databases
    Santini, S
    Jain, R
    [J]. 1996 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1996, : 646 - 651
  • [22] On contextual ranking queries in databases
    Li, Chengkai
    [J]. INFORMATION SYSTEMS, 2013, 38 (04) : 509 - 523
  • [23] Supporting exploratory queries in databases
    Kadlag, A
    Wanjari, AV
    Freire, J
    Haritsa, JR
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2004, 2973 : 594 - 605
  • [24] POLYNOMIAL QUERIES TO RELATIONAL DATABASES
    LIVCHAK, AB
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 1985, 11 (02) : 107 - 112
  • [25] Preference queries in deductive databases
    Govindarajan, K
    Jayaraman, B
    Mantha, S
    [J]. NEW GENERATION COMPUTING, 2001, 19 (01) : 57 - 86
  • [26] Databases will Visualize Queries too
    Gatterbauer, Wolfgang
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (12): : 1498 - 1501
  • [27] Preference queries in deductive databases
    Kannan Govindarajan
    Bharat Jayaraman
    Surya Mantha
    [J]. New Generation Computing, 2001, 19 : 57 - 86
  • [28] Integrity Checking for Aggregate Queries
    Samarin, Somayeh Dolatnezhad
    Amini, Morteza
    [J]. IEEE ACCESS, 2021, 9 (09): : 74068 - 74084
  • [29] RDF aggregate queries and views
    Hung, E
    Deng, Y
    Subrahmanian, VS
    [J]. ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 717 - 728
  • [30] On the equivalence and rewriting of aggregate queries
    Stéphane Grumbach
    Maurizio Rafanelli
    Leonardo Tininini
    [J]. Acta Informatica, 2004, 40 : 529 - 584