Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus

被引：0

作者：

Li, Xuansong ^{[1
]}

Palmer, Martha

Xue, Nianwen

Ramshaw, Lance

Maamouri, Mohamed

Bies, Ann

Conger, Kathryn Summerville

Grimes, Stephen

Strassel, Stephanie

机构：

[1] Univ Penn, Linguist Data Consortium, Philadelphia, PA 19104 USA

来源：

LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2016年

关键词：

machine translation; parallel aligned Treebank; word alignment; PropBank; co-reference;

D O I：

暂无

中图分类号：

H [语言、文字];

学科分类号：

05 ;

摘要：

High accuracy for automated translation and information retrieval calls for linguistic annotations at various language levels. The plethora of informal internet content sparked the demand for porting state-of-art natural language processing (NLP) applications to new social media as well as diverse language adaptation. Effort launched by the BOLT (Broad Operational Language Translation) program at DARPA (Defense Advanced Research Projects Agency) successfully addressed the internet information with enhanced NLP systems. BOLT aims for automated translation and linguistic analysis for informal genres of text and speech in online and in-person communication. As a part of this program, the Linguistic Data Consortium (LDC) developed valuable linguistic resources in support of the training and evaluation of such new technologies. This paper focuses on methodologies, infrastructure, and procedure for developing linguistic annotation at various language levels, including Treebank (TB), word alignment (WA), PropBank (PB), and co-reference (CoRef). Inspired by the OntoNotes approach with adaptations to the tasks to reflect the goals and scope of the BOLT project, this effort has introduced more annotation types of informal and free-style genres in English, Chinese and Egyptian Arabic. The corpus produced is by far the largest multi-lingual, multi-level and multi-genre annotation corpus of informal text and speech.

引用

下载

页码：906 / 913

页数：8

共 50 条

[1] Sentence and Clause Level Emotion Annotation, Detection, and Classification in a Multi-Genre Corpus
Tafreshi, Shabnam
Diab, Mona
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1246 - 1251
[2] The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic
Abdulrahim, Dana
Inoue, Go
Shamsan, Latifa
Khalifa, Salam
Habash, Nizar
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2345 - 2352
[3] A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic
Cotterell, Ryan
Callison-Burch, Chris
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
[4] Building a Corpus of Multi-Lingual and Multi-Format International Investment Agreements
Sugisaki, Kyoko
Volk, Martin
Polanco, Rodrigo
Alschner, Wolfgang
Skougarevskiy, Dmitriy
LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 294 : 203 - 206
[5] JS']JSPEECH: A MULTI-LINGUAL CONVERSATIONAL SPEECH CORPUS
Choobbasti, Ali Janalizadeh
Gholamian, Mohammad Erfan
Vaheb, Amir
Safavi, Saeid
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 927 - 933
[6] The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents
Poignant, Johann
Budnik, Mateusz
Bredin, Herve
Barras, Claude
Stefas, Mickael
Bruneau, Pierrick
Adda, Gilles
Besacier, Laurent
Ekenel, Hazim
Francopoulo, Gil
Hernando, Javier
Mariani, Joseph
Morros, Ramon
Quenot, Georges
Rosset, Sophie
Tamisier, Thomas
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1421 - 1425
[7] Multi-lingual threading
Kind, A
Padget, J
PROCEEDINGS OF THE SIXTH EUROMICRO WORKSHOP ON PARALLEL AND DISTRIBUTED PROCESSING - PDP '98, 1998, : 431 - 437
[8] MULTI-LINGUAL INTERPRETATION
ROSENNE, S
ISRAEL LAW REVIEW, 1971, 6 (03) : 360 - 366
[9] MULTI-LINGUAL SCHOLAR
BOLTON, W
COMPUTERS AND THE HUMANITIES, 1989, 23 (03): : 263 - 265
[10] Large Scale Multi-Lingual Multi-Modal Summarization Dataset
Verma, Yash
Jangra, Anubhav
Kumar, Raghvendra
Saha, Sriparna
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3620 - 3632

← 1 2 3 4 5 →