Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus

被引:2
|
作者
Chen, Liang-Ching [1 ,2 ]
Chang, Kuei-Hu [3 ,4 ]
Yang, Shu-Ching [2 ]
机构
[1] ROC Mil Acad, Dept Foreign Languages, Kaohsiung, Taiwan
[2] Natl Sun Yat Sen Univ, Inst Educ, Kaohsiung, Taiwan
[3] ROC Mil Acad, Dept Management Sci, Kaohsiung, Taiwan
[4] Asia Univ, Inst Innovat & Circular Econ, Taichung, Taiwan
关键词
Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military; LANGUAGE; ALGORITHMS;
D O I
10.4025/actascitechnol.v44i1.60486
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Within the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn's hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues.
引用
收藏
页数:10
相关论文
共 16 条
  • [1] Corpus-based bilingual terminology extraction in the power engineering domain
    Ivanovic, Tanja
    Stankovic, Ranka
    Todorovic, Branislava Sandrih
    Krstev, Cvetana
    TERMINOLOGY, 2022, 28 (02): : 228 - 263
  • [2] An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus
    Chen, Liang-Ching
    SUSTAINABILITY, 2023, 15 (04)
  • [3] GUIDELINES FOR THE DESIGN AND GENERATION OF CORPUS-BASED TERMINOLOGY-ORIENTED INSTRUCTIONAL MATERIAL
    Losey Leon, Araceli
    ENCUENTRO-REVISTA DE INVESTIGACION E INNOVACION EN LA CLASE DE IDIOMAS, 2018, (27): : 173 - 188
  • [4] Corpus-based semantic role approach in information retrieval
    Moreda, Palorna
    Navarro, Borja
    Palomar, Manuel
    DATA & KNOWLEDGE ENGINEERING, 2007, 61 (03) : 467 - 483
  • [5] An approach to corpus-based discourse analysis: The move analysis as example
    Upton, Thomas A.
    Cohen, Mary Ann
    DISCOURSE STUDIES, 2009, 11 (05) : 585 - 605
  • [6] A hybrid approach to interactive machine translation - Integrating rule-based, corpus-based, and example-based method
    Yamabana, K
    Kamei, S
    Muraki, K
    Doi, S
    Tamura, S
    Satoh, K
    IJCAI-97 - PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, 1997, : 977 - 982
  • [7] 'Diversity' as enacted in US immigration politics and law: a corpus-based approach
    Gales, Tammy
    DISCOURSE & SOCIETY, 2009, 20 (02) : 223 - 240
  • [8] An axiomatic approach to corpus-based cross-language information retrieval
    Rahimi, Razieh
    Montazeralghaem, Ali
    Shakery, Azadeh
    INFORMATION RETRIEVAL JOURNAL, 2020, 23 (03): : 191 - 215
  • [9] An axiomatic approach to corpus-based cross-language information retrieval
    Razieh Rahimi
    Ali Montazeralghaem
    Azadeh Shakery
    Information Retrieval Journal, 2020, 23 : 191 - 215
  • [10] Bilingual Terminology and ad hoc documentation for conference interpreters. A corpus-based methodological approach
    Gallego, Daniel
    Tolosa, Miguel
    ESTUDIOS DE TRADUCCION, 2012, 2 : 33 - 46