Sentence Alignment of Bilingual Survey Texts Applying a Metadata-Aware Strategy

被引：0

作者：

Sorato, Danielly ^{[1
]}

Zavala-Rojas, Diana ^{[1
,2
]}

机构：

[1] Univ Pompeu Fabra, Res & Expertise Ctr Survey Methodol, Barcelona, Spain

[2] European Social Survey ERIC, London, England

来源：

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022) | 2022年 / 13286卷

关键词：

Sentence alignment; Survey translation; Metadata;

D O I：

10.1007/978-3-031-08473-7_43

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sentence alignment is a crucial task in the process of building parallel corpora. Off-the-shelf tools for sentence alignment generally perform well to this end. However in certain cases, depending on factors such as the sentence structure and the amount of contextual information, the sentence alignment task can be challenging and require further resources that may be difficult to find, such as domain-specific bilingual dictionaries. Although investing in creating additional linguistic resources is frequently the chosen option in these circumstances, leveraging extralinguistic information such as sentence-level metadata can be an easier alternative to narrow the alignment search space. This paper presents a method designed for the alignment of bilingual survey questionnaires' texts, which leverages sentence-level metadata annotations. We build eight gold standards in four distinct languages to measure our sentence aligner performance, namely Catalan, French, Portuguese, and Spanish.

引用

页码：469 / 476

页数：8