Bayesian Constituent Context Model for Grammar Induction

被引:1
|
作者
Zhang, Min [1 ]
Duan, Xiangyu [1 ]
Chen, Wenliang [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
基金
中国国家自然科学基金;
关键词
Bayesian; constituent context model; grammar induction; smoothing;
D O I
10.1109/TASLP.2013.2294584
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Constituent Context Model (CCM) is an effective generative model for grammar induction, the aim of which is to induce hierarchical syntactic structure from natural text. The CCM simply defines the Multinomial distribution over constituents, which leads to a severe data sparse problem because long constituents are unlikely to appear in unseen data sets. This paper proposes a Bayesian method for constituent smoothing by defining two kinds of prior distributions over constituents: the Dirichlet prior and the Pitman-Yor Process prior. The Dirichlet prior functions as an additive smoothing method, and the PYP prior functions as a back-off smoothing method. Furthermore, a modified CCM is proposed to differentiate left constituents and right constituents in binary branching trees. Experiments show that both the proposed Bayesian smoothing method and the modified CCM are effective, and combining them attains or significantly improves the state-of-the-art performance of grammar induction evaluated on standard treebanks of various languages.
引用
收藏
页码:531 / 541
页数:11
相关论文
共 50 条
  • [1] Bayesian constituent context model for grammar induction
    School of Computer Science and Technology, Soochow University, Suzhou 215006, China
    [J]. IEEE Trans. Audio Speech Lang. Process., 2 (531-541):
  • [2] A generative constituent-context model for improved grammar induction
    Klein, D
    Manning, CD
    [J]. 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 128 - 135
  • [3] Natural language grammar induction with a generative constituent-context model
    Klein, D
    Manning, CD
    [J]. PATTERN RECOGNITION, 2005, 38 (09) : 1407 - 1419
  • [4] Unsupervised Grammar Induction Using a Parent Based Constituent Context Model
    Mirroshandel, Seyed Abolghasem
    Ghassem-Sani, Gholamreza
    [J]. ECAI 2008, PROCEEDINGS, 2008, 178 : 293 - +
  • [5] Natural language grammar induction using a constituent-context model
    Klein, D
    Manning, CD
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 35 - 42
  • [6] Variational Bayesian grammar induction for natural language
    Kurihara, Kenichi
    Sato, Taisuke
    [J]. GRAMMATICAL INFERENCE: ALGORITHMS AND APPLICATIONS, PROCEEDINGS, 2006, 4201 : 84 - 96
  • [7] Learning Design Patterns with Bayesian Grammar Induction
    Talton, Jerry O.
    Yang, Lingfeng
    Kumar, Ranjitha
    Lim, Maxine
    Goodman, Noah
    Mech, Radomir
    [J]. UIST'12: PROCEEDINGS OF THE 25TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2012, : 63 - 73
  • [8] A structured context model for grammar learning
    Chang, Nancy
    Mok, Eva
    [J]. 2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 1604 - +
  • [9] A BAYESIAN MODEL FOR SCENE CLASSIFICATION WITH A VISUAL GRAMMAR
    Wang, Xiaoru
    Liu, Jie
    [J]. 2011 4TH IEEE INTERNATIONAL CONFERENCE ON BROADBAND NETWORK AND MULTIMEDIA TECHNOLOGY (4TH IEEE IC-BNMT2011), 2011, : 310 - 314
  • [10] A Neural Model for Regular Grammar Induction
    Belcak, Peter
    Hofer, David
    Wattenhofer, Roger
    [J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 401 - 406