Detecting irrelevant subtrees to improve probabilistic learning from tree-structured data

被引:0
|
作者
Habrard, A [1 ]
Bernard, M [1 ]
Sebban, M [1 ]
机构
[1] Univ St Etienne, EURISE, F-42023 St Etienne 2, France
关键词
data reduction; tree-structured data; noisy data; stochastic tree automata;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In front of the large increase of the available amount of structured data (such as XML documents), many algorithms have emerged for dealing with tree-structured data. In this article, we present a probabilistic approach which aims at a priori pruning noisy or irrelevant subtrees in a set of trees. The originality of this approach, in comparison with classic data reduction techniques, comes from the fact that only a part of a tree (i.e. a subtree) can be deleted, rather than the whole tree itself. Our method is based on the use of confidence intervals, on a partition of subtrees, computed according to a given probability distribution. We propose an original approach to assess these intervals on tree-structured data and we experimentally show its interest in the presence of noise.
引用
收藏
页码:103 / 130
页数:28
相关论文
共 50 条
  • [21] Tree Colors: Color Schemes for Tree-Structured Data
    Tennekes, Martijn
    de Jonge, Edwin
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2014, 20 (12) : 2072 - 2081
  • [22] Finding Frequent Patterns from Compressed Tree-Structured Data
    Murakami, Seiji
    Doi, Koichiro
    Yamamoto, Akihiro
    [J]. DISCOVERY SCIENCE, PROCEEDINGS, 2008, 5255 : 284 - 295
  • [23] Learning Program Representations with a Tree-Structured Transformer
    Wang, Wenhan
    Zhang, Kechi
    Li, Ge
    Liu, Shangqing
    Li, Anran
    Jin, Zhi
    Liu, Yang
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 248 - 259
  • [24] Mining Tree-Structured Data on Multicore Systems
    Tatikonda, Shirish
    Parthasarathy, Srinivasan
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (01):
  • [25] Tree-structured supervised learning and the genetics of hypertension
    Huang, J
    Lin, A
    Narasimhan, B
    Quertermous, T
    Hsiung, CA
    Ho, LT
    Grove, JS
    Olivier, M
    Ranade, K
    Risch, NJ
    Shen, RA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (29) : 10529 - 10534
  • [26] Monadic queries over tree-structured data
    Gottlob, G
    Koch, C
    [J]. 17TH ANNUAL IEEE SYMPOSIUM ON LOGIC IN COMPUTER SCIENCE, PROCEEDINGS, 2002, : 189 - 202
  • [27] Efficient change detection in tree-structured data
    Kim, DA
    Lee, SK
    [J]. WEB AND COMMUNICATION TECHNOLOGIES AND INTERNET-RELATED SOCIAL ISSUES - HSI 2003, 2003, 2713 : 675 - 681
  • [28] On subtyping of tree-structured data: A polynomial approach
    Bry, F
    Drabent, W
    Maluszynski, J
    [J]. PRINCIPLES AND PRACTICE OF SEMANTIC WEB REASONING, PROCEEDINGS, 2004, 3208 : 1 - 18
  • [29] Multidimensional Data Querying on Tree-Structured Overlay
    XU Lizhen
    [J]. Wuhan University Journal of Natural Sciences, 2006, (05) : 1367 - 1372
  • [30] Hashing Tree-Structured Data: Methods and Applications
    Tatikonda, Shirish
    Parthasarathy, Srinivasan
    [J]. 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 429 - 440