Corpus-Based Vocabulary List for Thai Language

被引：1

作者：

Ketmaneechairat, Hathairat ^{[1
]}

Maliyaem, Maleerat ^{[2
]}

机构：

[1] King Mongkuts Univ Technol, Coll Ind Technol, North Bangkok, Thailand

[2] King Mongkuts Univ Technol, Informat Technol & Digital Innovat, North Bangkok, Thailand

来源：

JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY | 2023年 / 14卷 / 02期

关键词：

corpus-based vocabulary; Thai language; frequency of words; statistical data;

D O I：

10.12720/jait.14.2.319-327

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

For natural language processing, a corpus is important for training models as also for the algorithms to create the machine learning models. This paper aimed to describe the design and process in creating a corpus-based vocabulary in the Thai language that can be used as a main corpus for natural language processing research. A corpus is created under the regulation of language. By using the actual Word Usage Frequency (WUF) analyzed from a text corpus cover several types of contents. The results presented the frequency of use of several characteristics, namely the frequency of word use character usage frequency and the frequency of using bigram characters. To be used in this research and used as important information for further NLP research. Based on the findings, it was concluded that the average word length increases when the number of words in the corpus increases. It means that the correlation between word length and frequency of words is in the same direction.

引用

页码：319 / 327

页数：9

共 50 条

[41] Language Problems and Language Planning A corpus-based historical investigation
Li, Wenwen
Liu, Haitao
LANGUAGE PROBLEMS & LANGUAGE PLANNING, 2013, 37 (02): : 151 - 177
[42] A Corpus-based Study of The vocabulary collocation in Adult EFL Learners' Writing
王宇
校园英语, 2019, (40) : 210 - 211
[43] How to trace the growth in learners active vocabulary? A corpus-based study
Lenko-Szymanska, A
TEACHING AND LEARNING BY DOING CORPUS ANALYSIS, 2002, (42): : 217 - 230
[44] A Corpus-based Study of Vocabulary in the New Concept English Textbook Series
Yang, Lu
Coxhead, Averil
RELC JOURNAL, 2022, 53 (03) : 597 - 611
[45] An Exploration into the Corpus-Based Approach to the Learning and Teaching of College English Vocabulary
Shang, Xu-Ming
2015 2ND INTERNATIONAL CONFERENCE ON EDUCATION AND SOCIAL DEVELOPMENT, ICESD 2015, 2015, : 726 - 730
[46] Computer-assisted Corpus-based College English Vocabulary Teaching
Liu Boru
PROCEEDINGS OF 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, VOLS I-VI, 2012, : 1864 - 1867
[47] Academic Vocabulary in Applied Linguistics Research Articles: A Corpus-Based Study
Xodabande, Ismail
Torabzadeh, Shima
Ghafouri, Mohammad
Emadi, Azadeh
JOURNAL OF LANGUAGE AND EDUCATION, 2022, 8 (02): : 154 - 164
[48] An Investigation on Non-English Majors' Corpus-based Vocabulary Learning
Hua Min
PROCEEDINGS OF 2015 INTERNATIONAL SYMPOSIUM - COLLEGE FOREIGN LANGUAGES EDUCATION REFORM AND INNOVATION, 2015, : 234 - 237
[49] USING NLP TO CREATE CORPUS-BASED VOCABULARY EXERCISES IN LATIN CLASSES
Beyer, A.
Schulz, K.
14TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE (INTED2020), 2020, : 1750 - 1757
[50] A corpus-based study of vocabulary in massive open online courses (MOOCs)
Liu, Chen-Yu
ENGLISH FOR SPECIFIC PURPOSES, 2023, 72 : 40 - 50

← 1 2 3 4 5 →