Construction of Scholarly n-Gram from Huge Text Data

被引:1
|
作者
Hwang, Myunggwon [1 ]
Hwang, Mi-Nyeong [1 ]
Yeom, Ha-Neul [2 ]
Jung, Hanmin [1 ]
机构
[1] KISTI, Daejeon, South Korea
[2] Univ Sci & Technol, KISTI, Daejeon, South Korea
关键词
scholarly n-gram; context n-gram; time-dependent n-gram; personalized n-gram;
D O I
10.1109/IMIS.2014.4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ultimate goal of this research is to provide n-gram data that is specialized for scholarly utilization. To this end, this paper outlines the construction of a scholarly n-gram through the processing of large text documents. Many researchers, especially non-native English language speakers, find it difficult to construct sentences and paragraphs with appropriate and disambiguated words. One of the methods that can assist them is the provision of n-gram data. A representative n-gram known as Web 1T 5-Gram Version 1, which was constructed by processing virtually all documents retrieved using Google, already exists. However, this data contain unfocused word recommendations, therefore, they are not suitable. Consequently, we are constructing a scholarly n-gram. In this paper, we demonstrate the efficiency of n-gram using Web 1T unigram and introduce and discuss the specifics of our research plan related to scholarly n-gram.
引用
收藏
页码:31 / 35
页数:5
相关论文
共 50 条
  • [1] Domain N-Gram Construction and Its Application to Text Editor
    Hwang, Myunggwon
    Choi, Dongjin
    Lee, Hyogap
    Kim, Pankoo
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2011, PT I, 2011, 6591 : 268 - 277
  • [2] N-gram Analysis of a Mongolian Text
    Altangerel, Khuder
    Tsend, Ganbat
    Jalsan, Khash-Erdene
    [J]. IFOST 2008: PROCEEDING OF THE THIRD INTERNATIONAL FORUM ON STRATEGIC TECHNOLOGIES, 2008, : 258 - 259
  • [3] SEARCHING FOR TEXT - SEND AN N-GRAM
    KIMBRELL, RE
    [J]. BYTE, 1988, 13 (05): : 297 - &
  • [4] Text mining with n-gram variables
    Schonlau, Matthias
    Guenther, Nick
    Sucholutsky, Ilia
    [J]. STATA JOURNAL, 2017, 17 (04): : 866 - 881
  • [5] Efficient n-gram construction for text categorization using feature selection techniques
    Garcia, Maximiliano
    Maldonado, Sebastian
    Vairetti, Carla
    [J]. INTELLIGENT DATA ANALYSIS, 2021, 25 (03) : 509 - 525
  • [6] Short Text Clustering using Numerical data based on N-gram
    Kumar, Rajiv
    Mathur, Robin Prakash
    [J]. 2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 274 - 276
  • [7] Are n-gram Categories Helpful in Text Classification?
    Kruczek, Jakub
    Kruczek, Paulina
    Kuta, Marcin
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT II, 2020, 12138 : 524 - 537
  • [8] A Neural N-Gram Network for Text Classification
    Yan, Zhenguo
    Wu, Yue
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2018, 22 (03) : 380 - 386
  • [9] N-Gram Based Paraphrase Generator from Large text Document
    Gadag, Ashwini I.
    Sagar, B. M.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, : 91 - 94
  • [10] N-GRAM ANALYSIS OF TEXT DOCUMENTS IN SERBIAN LANGUAGE
    Marovac, Ulfeta
    Pljaskovic, Aldina
    Crnisanin, Adela
    Kajan, Ejub
    [J]. 2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 1385 - 1388