Sentence Alignment of Bilingual Survey Texts Applying a Metadata-Aware Strategy

被引:0
|
作者
Sorato, Danielly [1 ]
Zavala-Rojas, Diana [1 ,2 ]
机构
[1] Univ Pompeu Fabra, Res & Expertise Ctr Survey Methodol, Barcelona, Spain
[2] European Social Survey ERIC, London, England
关键词
Sentence alignment; Survey translation; Metadata;
D O I
10.1007/978-3-031-08473-7_43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentence alignment is a crucial task in the process of building parallel corpora. Off-the-shelf tools for sentence alignment generally perform well to this end. However in certain cases, depending on factors such as the sentence structure and the amount of contextual information, the sentence alignment task can be challenging and require further resources that may be difficult to find, such as domain-specific bilingual dictionaries. Although investing in creating additional linguistic resources is frequently the chosen option in these circumstances, leveraging extralinguistic information such as sentence-level metadata can be an easier alternative to narrow the alignment search space. This paper presents a method designed for the alignment of bilingual survey questionnaires' texts, which leverages sentence-level metadata annotations. We build eight gold standards in four distinct languages to measure our sentence aligner performance, namely Catalan, French, Portuguese, and Spanish.
引用
收藏
页码:469 / 476
页数:8
相关论文
empty
未找到相关数据