Bayesian Constituent Context Model for Grammar Induction

被引:1
|
作者
Zhang, Min [1 ]
Duan, Xiangyu [1 ]
Chen, Wenliang [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
基金
中国国家自然科学基金;
关键词
Bayesian; constituent context model; grammar induction; smoothing;
D O I
10.1109/TASLP.2013.2294584
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Constituent Context Model (CCM) is an effective generative model for grammar induction, the aim of which is to induce hierarchical syntactic structure from natural text. The CCM simply defines the Multinomial distribution over constituents, which leads to a severe data sparse problem because long constituents are unlikely to appear in unseen data sets. This paper proposes a Bayesian method for constituent smoothing by defining two kinds of prior distributions over constituents: the Dirichlet prior and the Pitman-Yor Process prior. The Dirichlet prior functions as an additive smoothing method, and the PYP prior functions as a back-off smoothing method. Furthermore, a modified CCM is proposed to differentiate left constituents and right constituents in binary branching trees. Experiments show that both the proposed Bayesian smoothing method and the modified CCM are effective, and combining them attains or significantly improves the state-of-the-art performance of grammar induction evaluated on standard treebanks of various languages.
引用
收藏
页码:531 / 541
页数:11
相关论文
共 50 条
  • [41] A stochastic context-free grammar model for time series analysis
    Wang, W.
    Portnoy, V.
    Pollak, L.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PTS 1-3, PROCEEDINGS, 2007, : 1245 - +
  • [42] Grammar compression with probabilistic context-free grammar
    Naganuma, Hiroaki
    Hendrian, Diptarama
    Yoshinaka, Ryo
    Shinohara, Ayumi
    Kobayashi, Naoki
    [J]. 2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 386 - 386
  • [43] The Russian reference grammar, core grammar in functional context
    Rakova, A
    [J]. SLAVIC AND EAST EUROPEAN JOURNAL, 1999, 43 (02): : 417 - 418
  • [44] A Formal Model for Behavior Trees Based on Context-Free Grammar
    Anwer, Sajid
    Wen, Lian
    Wang, Zhe
    [J]. 2020 27TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2020), 2020, : 465 - 469
  • [45] Bayesian Estimation on Load Model Coefficients of ZIP and Induction Motor Model
    Li, Haifeng
    Chen, Qing
    Fu, Chang
    Yu, Zhe
    Shi, Di
    Wang, Zhiwei
    [J]. ENERGIES, 2019, 12 (03)
  • [46] Using Grammar Induction to Model Adaptive Behavior of Networks of Collaborative Agents
    Mulder, Wico
    Adriaans, Pieter
    [J]. GRAMMATICAL INFERENCE: THEORETICAL RESULTS AND APPLICATIONS, ICGI 2010, 2010, 6339 : 163 - 177
  • [47] Automatic Grammar Induction for Grammar Based Genetic Programming
    Palka, Dariusz
    Zachara, Marek
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2015, 9119 : 350 - 360
  • [48] TRANSLATIONS ON A CONTEXT FREE GRAMMAR
    AHO, AV
    ULLMAN, JD
    [J]. INFORMATION AND CONTROL, 1971, 19 (05): : 439 - +
  • [49] Ancient grammar: Content and context
    不详
    [J]. JOURNAL OF INDO-EUROPEAN STUDIES, 1999, 27 (3-4): : 517 - 517
  • [50] Learning Bayesian classifiers for a visual grammar
    Aksoy, S
    Koperski, K
    Tusk, C
    Marchisio, G
    Tilton, JC
    [J]. 2003 IEEE WORKSHOP ON ADVANCES IN TECHNIQUES FOR ANALYSIS OF REMOTELY SENSED DATA, 2004, : 212 - 218