On the need for structure modelling in sequence prediction

被引:3
|
作者
Twomey, Niall [1 ]
Diethe, Tom [1 ]
Flach, Peter [1 ]
机构
[1] Univ Bristol, Intelligent Syst Lab, Bristol, Avon, England
基金
英国工程与自然科学研究理事会;
关键词
Classification Performance; Probability Estimate; Activity Recognition; Conditional Random Field; Brier Score;
D O I
10.1007/s10994-016-5571-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is no uniform approach in the literature for modelling sequential correlations in sequence classification problems. It is easy to find examples of unstructured models (e.g. logistic regression) where correlations are not taken into account at all, but there are also many examples where the correlations are explicitly incorporated into a-potentially computationally expensive-structured classification model (e.g. conditional random fields). In this paper we lay theoretical and empirical foundations for clarifying the types of problem which necessitate direct modelling of correlations in sequences, and the types of problem where unstructured models that capture sequential aspects solely through features are sufficient. The theoretical work in this paper shows that the rate of decay of auto-correlations within a sequence is related to the excess classification risk that is incurred by ignoring the structural aspect of the data. This is an intuitively appealing result, demonstrating the intimate link between the auto-correlations and excess classification risk. Drawing directly on this theory, we develop well-founded visual analytics tools that can be applied a priori on data sequences and we demonstrate how these tools can guide practitioners in specifying feature representations based on auto-correlation profiles. Empirical analysis is performed on three sequential datasets. With baseline feature templates, structured and unstructured models achieve similar performance, indicating no initial preference for either model. We then apply the visual analytics tools to the datasets, and show that classification performance in all cases is improved over baseline results when our tools are involved in defining feature representations.
引用
收藏
页码:291 / 314
页数:24
相关论文
共 50 条
  • [1] On the need for structure modelling in sequence prediction
    Niall Twomey
    Tom Diethe
    Peter Flach
    Machine Learning, 2016, 104 : 291 - 314
  • [2] THE REPRESENTATION OF STRUCTURE IN SEQUENCE PREDICTION TASKS
    CLEEREMANS, A
    ATTENTION AND PERFORMANCE XV: CONSCIOUS AND NONCONSCIOUS INFORMATION PROCESSING, 1994, 15 : 783 - 809
  • [3] Sequence comparison and protein structure prediction
    Dunbrack, Roland L., Jr.
    CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (03) : 374 - 384
  • [4] Flood prediction in Japan and the need for guidelines for flood runoff modelling
    Tachikawa, Yasuto
    Shrestha, Roshan K.
    Sayama, Takahiro
    PREDICTIONS IN UNGAUGED BASINS: INTERNATIONAL PERSPECTIVES ON THE STATE OF THE ART AND PATHWAYS FORWARD, 2005, 301 : 78 - 86
  • [5] PREDICTION OF DNA-STRUCTURE FROM SEQUENCE
    HINGERTY, BE
    BROYDE, S
    FIGUEROA, S
    HAYDEN, T
    BIOPHYSICAL JOURNAL, 1988, 53 (02) : A309 - A309
  • [6] Protein structure prediction from sequence variation
    Debora S Marks
    Thomas A Hopf
    Chris Sander
    Nature Biotechnology, 2012, 30 : 1072 - 1080
  • [7] PROTEIN-STRUCTURE PREDICTION FROM SEQUENCE
    TAYLOR, WR
    COMPUTERS & CHEMISTRY, 1993, 17 (02): : 117 - 122
  • [8] Improving prediction of RNA structure from sequence
    Znosko, Brent
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 254
  • [9] The Dundee Resource for Sequence Analysis and Structure Prediction
    MacGowan, Stuart A.
    Madeira, Fabio
    Britto-Borges, Thiago
    Warowny, Mateusz
    Drozdetskiy, Alexey
    Procter, James B.
    Barton, Geoffrey J.
    PROTEIN SCIENCE, 2020, 29 (01) : 277 - 297
  • [10] A DEVELOPMENTAL STUDY OF SEQUENCE STRUCTURE IN BINARY PREDICTION
    MYERS, NA
    MYERS, JL
    JOURNAL OF EXPERIMENTAL CHILD PSYCHOLOGY, 1969, 7 (02) : 255 - &