A Hybrid Method for Word Segmentation with English-Vietnamese Bilingual Text

被引:0
|
作者
Quoc Hung Ngo [1 ]
Dinh Dien [2 ]
Winiwarter, Werner [3 ]
机构
[1] Univ Informat Technol, Fac Comp Sci, Ho Chi Minh City, Vietnam
[2] Univ Sci, Fac Informat Technol, Ho Chi Minh City, Vietnam
[3] Univ Vienna, Res Grp Data Analyt & Comp, A-1090 Vienna, Austria
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a hybrid approach for Vietnamese word segmentation. The approach combines a dictionary-based method and a machine learning method to detect word boundaries in Vietnamese text by comparing English-Vietnamese pairs. We also point out several characteristics of Vietnamese which affect the Vietnamese word segmentation task and word alignment of English-Vietnamese text. Moreover, we built an English-Vietnamese bilingual corpus with nearly 10 million words, namely EVBCorpus, while a part of EVBNews has been manually segmented at the word level. We evaluate the performance of our approach by comparing its word segmentation results on this corpus. Our hybrid approach achieves 97% accuracy on the EVBNews corpus.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] An English-Vietnamese Translation System Using Artificial Intelligence Approach
    Nguyen Van Binh
    Huynh Cong Phap
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2018, PT I, 2018, 10751 : 211 - 220
  • [22] An Enhanced Model for Lexical Gap Processing in English-Vietnamese Machine Translation
    Tuoi Phan Thi
    Hai Le Manh
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 105 - 108
  • [23] English-Vietnamese machine translation model based on sequence to sequence algorithm
    Jiang, Hao
    He, Yue
    Liao, Mengfan
    Jing, Yanmei
    Zhang, Chao
    PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 1086 - 1091
  • [24] A Hybrid Approach to Vietnamese Word Segmentation using Part of Speech tags
    Dang Due Pham
    Giang Binh Tran
    Son Bao Pham
    INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2009), 2009, : 154 - 161
  • [25] An Information Extraction approach to English-Vietnamese weather bulletins Machine Translation
    Son Bao Pham
    Giang Binh Tran
    Dang Duc Pham
    Kien Chi Phung
    Kien Trung Nguyen
    2009 FIRST ASIAN CONFERENCE ON INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2009, : 161 - +
  • [26] A Hybrid Feature Selection Method For Vietnamese Text Classification
    Nguyen Tri Hai
    Tuan Dinh Le
    Nguyen Hoang Nghia
    Vu Thanh Nguyen
    2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 91 - 96
  • [27] HMMs for Unsupervised Vietnamese Word Segmentation
    Ba-Long Bui
    Thi-Trang Nguyen
    Huu-Hoang Nguyen
    Kiem-Hieu Nguyen
    2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 284 - 289
  • [28] Vietnamese acquisition of English word stress
    Nguyen, Thu T. A.
    Ingram, John
    TESOL QUARTERLY, 2005, 39 (02) : 309 - 319
  • [29] A Classifier-Based Preordering Approach for English-Vietnamese Statistical Machine Translation
    Viet Hong Tran
    Huyen Thuong Vu
    Vinh Van Nguyen
    Minh Le Nguyen
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 74 - 87
  • [30] Bilingual word recognition in English and Greek
    Chitiri, HF
    Willows, DM
    APPLIED PSYCHOLINGUISTICS, 1997, 18 (02) : 139 - 156