Comparative study of term-weighting schemes for environmental big data using machine learning

被引:4
|
作者
Kim, JungJin [1 ]
Kim, Han-Ul [2 ]
Adamowski, Jan [3 ]
Hatami, Shadi [3 ]
Jeong, Hanseok [1 ,4 ,5 ]
机构
[1] Seoul Natl Univ Sci & Technol, Inst Environm Technol, Seoul 01811, South Korea
[2] Seoul Natl Univ Sci & Technol, Dept Appl Artificial Intelligence, Seoul 01811, South Korea
[3] McGill Univ, Dept Bioresource Engn, Ste Anne De Bellevue, PQ, Canada
[4] Seoul Natl Univ Sci & Technol, Dept Environm Engn, Seoul 01811, South Korea
[5] 120-1 Chungun Hall 232 Gongneung ro, Seoul 01811, South Korea
基金
新加坡国家研究基金会;
关键词
Text classification; Environmental digital news; Term -weighting schemes; Feature selection; TEXT; CLASSIFICATION; FRAMEWORK;
D O I
10.1016/j.envsoft.2022.105536
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Widely-used term-weighting schemes and machine learning (ML) classifiers with default parameter settings were assessed for their performance when applied to environmental big data analysis. Five term-weighting schemes [term frequency (TF), TF-inverse document frequency (TF-IDF), Best Match 25 (BM25), TF-inverse gravity moment (TF-IGM), and TF-IDF-inverse class frequency (TF-IDF-ICF)] and five different ML classifiers [support vector machine (SVM), Naive Bayes (NB), logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost)] were tested. The optimal text-classification scheme and classifier were TF-IDF-ICF and LR, respectively. Based on evaluation criteria, their combination resulted in the best performance of all scheme and classifier combinations for the full environmental data analysis. Category classification performance differed according to the environmental section (climate, air, water, or waste/garbage), with the best performance being achieved for climate, and the poorest for water. This demonstrated the importance of selecting term-weighting schemes and ML classifiers in human-generated environmental big data analysis.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Improving automatic bug assignment using time-metadata in term-weighting
    Shokripour, Ramin
    Anvik, John
    Kasirun, Zarinah M.
    Zamani, Sima
    IET SOFTWARE, 2014, 8 (06) : 269 - 278
  • [22] A Comparison of Recent Information Retrieval Term-Weighting Models Using Ancient Datasets
    Alkilinc, Ahmet
    Arslan, Ahmet
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [23] A Study of Term Weighting Schemes Using Class Information for Text Classification
    Ko, Youngjoong
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1029 - 1030
  • [24] A Review on Machine Learning Big Data using R
    Prakash, M.
    Padmapriya, G.
    Kumar, M. Vinoth
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 1873 - 1877
  • [25] Big Data Analytics using Machine Learning Techniques
    Mittal, Shweta
    Sangwan, Om Prakash
    2019 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2019), 2019, : 203 - 207
  • [26] Students' Orientation Using Machine Learning and Big Data
    Ouatik, Farouk
    Erritali, Mohammed
    Ouatik, Fahd
    Jourhmane, Mostafa
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2021, 17 (01) : 111 - 119
  • [27] Big Data Platform Configuration Using Machine Learning
    Yeh, Chao-Chun
    Lu, Han-Lin
    Zhou, Jiazheng
    Chang, Sheng-An
    Lin, Xuan-Yi
    Sun, Yi-Chiao
    Huang, Shih-Kun
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2020, 36 (03) : 469 - 493
  • [28] Comparative Analysis of Intrusion Detection Models using Big Data Analytics and Machine Learning Techniques
    Alaketu, Muyideen Ayodeji
    Oguntimilehin, Abiodun
    Olatunji, Kehinde Adebola
    Abiola, Oluwatoyin Bunmi
    Badeji-Ajisafe, Bukola
    Akinduyite, Christiana Olanike
    Obamiyi, Stephen Eyitayo
    Babalola, Gbemisola Olutosin
    Okebule, Toyin
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2024, 21 (02) : 326 - 337
  • [29] Machine Learning in Big Data
    Wang, Lidong
    Alexander, Cheryl Ann
    INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2016, 1 (02) : 52 - 61
  • [30] Machine Learning on Big Data
    Condie, Tyson
    Mineiro, Paul
    Polyzotis, Neoklis
    Weimer, Markus
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 1242 - 1244