Longest Sorted Sequence algorithm for parallel text alignment

被引:0
|
作者
Ildefonso, T [1 ]
Lopes, GP [1 ]
机构
[1] Univ Nova Lisboa, Fac Ciencias & Tecnol, CITI, P-2829516 Caparica, Portugal
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a language independent method for aligning parallel texts (texts that are translations of each other, or of a common source text), statistically supported. This new approach is inspired on previous work by Ribeiro et al (2000). The application of the second statistical filter, proposed by Ribeiro et al, based on Confidence Bands (CB), is substituted by the application of the Longest Sorted Sequence algorithm (LSSA). LSSA is described in this paper. As a result, 35% decrease in processing time and 18% increase in the number of aligned segments was obtained, for Portuguese-French alignments. Similar results were obtained regarding Portuguese-English alignments. Both methods are compared and evaluated, over a large parallel corpus made up of Portuguese, English and French parallel texts (approximately 250Mb of text per language).
引用
收藏
页码:81 / 90
页数:10
相关论文
共 50 条
  • [1] A Parallel Pairwise Local Sequence Alignment Algorithm
    Bandyopadhyay, Sanghamitra
    Mitra, Ramkrishna
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2009, 8 (02) : 139 - 146
  • [2] Parallel sequence alignment algorithm for clustering system
    Chen, Yang
    Yu, Songnian
    Leng, Ming
    KNOWLEDGE ENTERPRISE: INTELLIGENT STRATEGIES IN PRODUCT DESIGN, MANUFACTURING, AND MANAGEMENT, 2006, 207 : 311 - +
  • [3] A fast parallel algorithm for finding the longest common sequence of multiple biosequences
    Yixin Chen
    Andrew Wan
    Wei Liu
    BMC Bioinformatics, 7
  • [4] A fast parallel algorithm for finding the longest common sequence of multiple biosequences
    Chen, Yixin
    Wan, Andrew
    Liu, Wei
    BMC BIOINFORMATICS, 2006, 7 (Suppl 4)
  • [5] A parallel algorithm for the constrained multiple sequence alignment problem
    He, D
    Arslan, AN
    BIBE 2005: 5TH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, 2005, : 258 - 262
  • [6] A Multigroup Parallel Genetic Algorithm for Multiple Sequence Alignment
    Luo, Jiawei
    Zhang, Li
    Liang, Cheng
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2011, 7002 : 308 - 316
  • [7] Parallel text alignment
    Owen C.B.
    Ford J.
    Makedon F.
    Steinberg T.
    Metaxaki-Kossionides C.
    International Journal on Digital Libraries, 2000, 3 (1) : 100 - 114
  • [8] Using sequence alignment algorithm for analyzing text hierarchical structure
    Zhong, Maosheng
    Journal of Computational Information Systems, 2013, 9 (06): : 2269 - 2276
  • [9] Multiple molecular sequence alignment by island parallel genetic algorithm
    Anbarasu, LA
    Narayanasamy, P
    Sundararajan, V
    CURRENT SCIENCE, 2000, 78 (07): : 858 - 863
  • [10] A one-phase parallel algorithm for the sequence alignment problem
    Lecroq, Thierry
    Myoupo, Jean-Frederic
    Seme, David
    Parallel Processing Letters, 1998, 8 (04): : 515 - 526