A Discourse-based Chinese Chunkbank

被引:0
|
作者
Lu L. [1 ]
Jiao H.-Y. [1 ]
Li M. [1 ]
Xun E.-D. [1 ]
机构
[1] College of Information Science, Beijing Language and Culture University, Beijing
来源
基金
中国国家社会科学基金;
关键词
chunk; Corpus annotation; syntactic parsing; treebank;
D O I
10.16383/j.aas.c190828
中图分类号
学科分类号
摘要
In order to provide a large scale annotation of Chinese functional chunk for linguistic research and syntactic parsing, we present a method to quickly build a discourse based Chinese chunkbank with high quality in multi-domain: Firstly, we use punctuations, syntax, expression functions of VP and NP, to segment complex sentences into several independent simple sentences; Secondly, based on the syntactic function, textual function, discourse function and interpersonal function of the chunks, we design 4 phrase tags, 8 functional tags, 4 sentence boundary tags to depict the chunks, which was classified into 3 types and 5 kinds. the annotators annotated the skeleton structure and highlighted the head word of the predicate for every simple sentence. Until now, we have been annotating more than 10 million of Chinese characters, including 9 thousand of skeleton structures for 60 thousand sentences. The chunkbank covers a range of text genres, including baidubaike, internet news, patent, etc. At the same time, we explored an effective model of crowdsourced data management. © 2022 Science Press. All rights reserved.
引用
收藏
页码:2911 / 2921
页数:10
相关论文
共 26 条
  • [1] Zhang X, Xue N., Extending and scaling up the Chinese treebank annotation, Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 27-34, (2012)
  • [2] Zhou Qiang, Zhang Wei, Yu Shi-Wen, The building of Chinese treebank, Journal of Chinese Information Processing, 11, 4, pp. 43-52, (1997)
  • [3] Zhou Qiang, Annotation scheme for Chinese treebank, Journal of Chinese Information Processing, 18, 4, (2004)
  • [4] Chen K J, Luo C C, Chang M C, Chen F Y, Chen C J, Huang C R., Sinica Treebank: Design criteria, representational issues and implementation, (2019)
  • [5] Che W, Li Z, Liu T., Chinese dependency treebank1.0 (LDC-2012T05) [DB/OL], (2019)
  • [6] Guo Li-Juan, Pen Xue, Li Zheng-Hua, Zhang Min, Construction of Chinese dependency syntax treebanks for multi-domain and multi-source texts, Journal of Chinese Information Processing, 33, 2, pp. 34-42, (2019)
  • [7] Brody M., Phrase structure and dependence, (2019)
  • [8] Qiu Li-Kun, Jin Peng, Wang Hou-Feng, A multi-view Chinese treebank based on dependency grammar, Journal of Chinese Information Processing, 29, 3, pp. 9-15, (2015)
  • [9] Zhou Qiang, Build a large scale Chinese functional chunkbank, Proceedings of the 6th National Conference on Computational Linguistics-natural Language Understanding and Machine Translation, 6, (2001)
  • [10] Xue N, Palmer M., Adding semantic roles to the Chinese treebank, Natural Language Engineering, 15, 1, (2009)