Lexis: An Optimization Framework for Discovering the Hierarchical Structure of Sequential Data

被引:3
|
作者
Siyari, Payam [1 ]
Dilkina, Bistra [1 ]
Dovrolis, Constantine [1 ]
机构
[1] GeorgiaTech, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
D O I
10.1145/2939672.2939741
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data represented as strings abounds in biology, linguistics, document mining, web search and many other fields. Such data often have a hierarchical structure, either because they were artificially designed and composed in a hierarchical manner or because there is an underlying evolutionary process that creates repeatedly more complex strings from simpler substrings. We propose a framework, referred to as Lexis, that produces an optimized hierarchical representation of a given set of "target" strings. The resulting hierarchy, "Lexis-DAG", shows how to construct each target through the concatenation of intermediate substrings, minimizing the total number of such concatenations or DAG edges. The Lexis optimization problem is related to the smallest grammar problem. After we prove its NP-hardness for two cost formulations, we propose an efficient greedy algorithm for the construction of Lexis-DAGs. We also consider the problem of identifying the set of intermediate nodes (substrings) that collectively form the "core" of a Lexis-DAG, which is important in the analysis of Lexis-DAGs. We show that the Lexis framework can be applied in diverse applications such as optimized synthesis of DNA fragments in genomic libraries, hierarchical structure discovery in protein sequences, dictionary-based text compression, and feature extraction from a set of documents.
引用
收藏
页码:1185 / 1194
页数:10
相关论文
共 50 条
  • [1] DISCOVERING HIERARCHICAL STRUCTURE IN NORMAL RELATIONAL DATA
    Schmidt, Mikkel N.
    Herlau, Tue
    Morup, Morten
    [J]. 2014 4TH INTERNATIONAL WORKSHOP ON COGNITIVE INFORMATION PROCESSING (CIP), 2014,
  • [2] Discovering the Sequential Structure of Thought
    Anderson, John R.
    Fincham, Jon M.
    [J]. COGNITIVE SCIENCE, 2014, 38 (02) : 322 - 352
  • [3] Discovering hierarchical motion structure
    Gershman, Samuel J.
    Tenenbaum, Joshua B.
    Jaekel, Frank
    [J]. VISION RESEARCH, 2016, 126 : 232 - 241
  • [4] A Profit Optimization Framework of Energy Storage Devices in Data Centers: Hierarchical Structure and Hybrid Types
    Lin, Xue
    Pedram, Massoud
    Tang, Jian
    Wang, Yanzhi
    [J]. PROCEEDINGS OF 2016 IEEE 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2016, : 640 - 647
  • [5] Discovering unbounded episodes in sequential data
    Casas-Garriga, G
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 : 83 - 94
  • [6] Sequential Sampling Framework for Metamodeling Uncertainty Reduction in Multilevel Optimization of Hierarchical Systems
    Xu, Can
    Zhu, Ping
    Liu, Zhao
    [J]. JOURNAL OF MECHANICAL DESIGN, 2021, 143 (10)
  • [7] Discovering hierarchical structure in terrorist networks
    Shaikh, Muhammad Akram
    Wang, Jiaxin
    [J]. SECOND INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES 2006, PROCEEDINGS, 2006, : 238 - +
  • [8] Discovering mappings in hierarchical data from multiple sources using the inherent structure
    K. Selçuk Candan
    Jong Wook Kim
    Huan Liu
    Reshma Suvarna
    [J]. Knowledge and Information Systems, 2006, 10 : 185 - 210
  • [9] Discovering mappings in hierarchical data from multiple sources using the inherent structure
    Candan, K. Selcuk
    Kim, Jong Wook
    Liu, Huan
    Suvarna, Reshma
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 10 (02) : 185 - 210
  • [10] Optimization method of hierarchical sequential testing
    Chen, Gang-Yong
    Yang, Peng
    Qiu, Jing
    Liu, Guan-Jun
    [J]. Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2009, 15 (01): : 179 - 183