A method for similarity-based grouping of biological data

被引:0
|
作者
Jakoniene, Vaida [1 ]
Rundqvist, David [1 ]
Lambrix, Patrick [1 ]
机构
[1] Linkoping Univ, Dept Comp & Informat Sci, SE-58183 Linkoping, Sweden
关键词
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Similarity-based grouping of data entries in one or more data sources is a task underlying many different data management tasks, such as, structuring search results, removal of redundancy in databases and data integration. Similarity-based grouping of data entries is not a trivial task in the context of life science data sources as the stored data is complex, highly correlated and represented at different levels of granularity. The contribution of this paper is two-fold. 1) We propose a method for similarity-based grouping and 2) we show results from test cases. As the main steps the method contains specification of grouping rules, pairwise grouping between entries, actual grouping of similar entries, and evaluation and analysis of the results. Often, different strategies can be used in the different steps. The method enables exploration of the influence of the choices and supports evaluation of the results with respect to given classifications. The grouping method is illustrated by test cases based on different strategies and classifications. The results show the complexity of the similarity-based grouping tasks and give deeper insights in the selected grouping tasks, the analyzed data source, and the influence of different strategies on the results.
引用
收藏
页码:136 / 151
页数:16
相关论文
共 50 条
  • [1] Extensible and similarity-based grouping for data integration
    Schallehn, E
    Sattler, KU
    Saake, G
    [J]. 18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 277 - 277
  • [2] SBGTool: Similarity-Based Grouping Tool for Students' Learning Outcomes
    Mohseni, Zeynab
    Martins, Rafael M.
    Masiello, Italo
    [J]. PROCEEDINGS OF THE 2021 SWEDISH WORKSHOP ON DATA SCIENCE (SWEDS), 2021,
  • [3] A chemical-biological similarity-based grouping of complex substances as a prototype approach for evaluating chemical alternatives
    Grimm, Fabian A.
    Iwata, Yasuhiro
    Sirenko, Oksana
    Chappell, Grace A.
    Wright, Fred A.
    Reif, David M.
    Braisted, John
    Gerhold, David L.
    Yeakley, Joanne M.
    Shepard, Peter
    Seligmann, Bruce
    Roy, Tim
    Boogaard, Peter J.
    Ketelslegers, Hans B.
    Rohde, Arlean M.
    Rusyn, Ivan
    [J]. GREEN CHEMISTRY, 2016, 18 (16) : 4407 - 4419
  • [4] Similarity-based data reduction techniques
    Guo, G
    Wang, H
    Bell, D
    [J]. JOURNAL OF RESEARCH AND PRACTICE IN INFORMATION TECHNOLOGY, 2005, 37 (02): : 211 - 232
  • [5] Similarity-based data reduction and classification
    Guo, GD
    Wang, H
    Bell, D
    Liao, ZN
    [J]. Monitoring, Security, and Rescue Techniques in Multiagent Systems, 2005, : 227 - 238
  • [6] A similarity-based robust clustering method
    Yang, MS
    Wu, KL
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (04) : 434 - 448
  • [7] A Similarity-Based Method for Entity Coreference Resolution in Big Data Environment
    Geng, Yushui
    Li, Peng
    Zhao, Jing
    [J]. PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING (AMITP 2016), 2016, 60 : 110 - 116
  • [8] Federated similarity-based learning with incomplete data
    Pekala, Barbara
    Szkola, Jaroslaw
    Dyczkowski, Krzysztof
    Wilbik, Anna
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, FUZZ, 2023,
  • [9] A Similarity-Based Clustering Algorithm for Fuzzy Data
    Hung, Wen-Liang
    Yang, Miin-Shen
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
  • [10] Efficient similarity-based operations for data integration
    Schallehn, E
    Sattler, KU
    Saake, G
    [J]. DATA & KNOWLEDGE ENGINEERING, 2004, 48 (03) : 361 - 387